Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Google Pay prepares AI agent using Universal Commerce Protocol

May 29, 2026

Frontier models score less than 50% on first benchmark for agent-based enterprise IT tasks — by Artificial Analysis and IBM

May 28, 2026

Tools to understand how your content is created and edited

May 28, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, May 29
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»I’ll look back
Tools

I’ll look back

versatileaiBy versatileaiApril 22, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email






Over the past few months we have been working on data. With this collaboration between embracing faces and Aguilla and the support of the open source ML community, our goal was to enable collectively creating datasets that have an impact on the open source community.

Now we have decided to move forward with the same goal. We organized it into two sections: community initiatives and cookbook initiatives, to provide an overview of achievements and tasks that everyone can contribute.

Community initiatives

The first step in this initiative focused on the prompt ranking project. Our goal was to create a dataset of synthetic and human-generated 10K prompts ranked by quality. The community response was immediate!

A few days later, over 385 people participated. We have released the DIBT/10K_PROMPTS_RANKED dataset for prompt ranking tasks or synthetic data generation. The dataset was used to build new models such as Spin.

Looking at global support from the community, we realized that English-centered data alone is not enough, and that open LLM does not have enough language-specific benchmarks. So, we created the Multilingual Prompt Evaluation Project (MPEP) with the aim of developing leaderboards for multiple languages. To that end, a subset of 500 high quality prompts from DIBT/10K_PROMPTS_RANKED was chosen to be translated into different languages.

Over 18 language leaders have created spaces for translation. Complete Dutch, Russian or Spanish translations, and more efforts are working towards a full translation of the prompt. Creating a Dataset Builder Community in Discord

We will continue to support our community’s efforts in the future, focusing on building datasets through tools and documentation.

Cookbook effort

As part of DIBT, we also created guides and tools to help the community build valuable datasets themselves.

Domain-specific datasets: Bootstraps the creation of more domain-specific datasets for training models, bringing together engineers and domain experts. DPO/ORPO datasets: help to nurture a community of people building more DPO-style datasets for different languages, domains, and tasks. KTO dataset: To allow communities to create their own KTO datasets.

What did we learn?

The community is eager to participate in these efforts and is excited to work collectively on the dataset. There are existing inequalities that must be overcome to ensure an inclusive and comprehensive benchmark. Currently, the open source community is underestimating datasets for specific languages, domains, and tasks. There are many of the tools that the community needs to effectively cooperate in building valuable datasets.

How can I get involved?

Follow the directions in the Readme for your project of interest to share datasets and results with the community, and contribute to your cookbook efforts by providing new guides and tools for everyone. Your contributions are invaluable to help us build robust and inclusive resources for everyone.

If you want to participate in it, join us on the #Data-IS-Better-Together channel of embracing face mismatches.

We look forward to building a better dataset with you!

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSmall businesses are embracing AI for quick productive wins, research finds
Next Article Dewan Rakyat speakers need new laws to tackle the delinquent losses arising from AI misuse
versatileai

Related Posts

Tools

Google Pay prepares AI agent using Universal Commerce Protocol

May 29, 2026
Tools

Frontier models score less than 50% on first benchmark for agent-based enterprise IT tasks — by Artificial Analysis and IBM

May 28, 2026
Tools

Tools to understand how your content is created and edited

May 28, 2026
Add A Comment

Comments are closed.

Top Posts

AI Video Creation Tools Are Now Here! – RayHaber

February 13, 202563 Views

10 Best AI for PowerPoint presentations

February 13, 202560 Views

Edimakor V4.2.0 unveils AI video tools at VEO 3

August 4, 202544 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

AI Video Creation Tools Are Now Here! – RayHaber

February 13, 202563 Views

10 Best AI for PowerPoint presentations

February 13, 202560 Views

Edimakor V4.2.0 unveils AI video tools at VEO 3

August 4, 202544 Views
Don't Miss

Google Pay prepares AI agent using Universal Commerce Protocol

May 29, 2026

Frontier models score less than 50% on first benchmark for agent-based enterprise IT tasks — by Artificial Analysis and IBM

May 28, 2026

Tools to understand how your content is created and edited

May 28, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?