Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Introducing training clusters as a service

June 12, 2025

Qualcomm (QCOM) expands AI research at new centres in Vietnam

June 11, 2025

What is AI Cleaning and Why Businesses Should Stop Exaggerating AI Proficiency | Articles

June 11, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Thursday, June 12
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»I’ll look back
Tools

I’ll look back

versatileaiBy versatileaiApril 22, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email






Over the past few months we have been working on data. With this collaboration between embracing faces and Aguilla and the support of the open source ML community, our goal was to enable collectively creating datasets that have an impact on the open source community.

Now we have decided to move forward with the same goal. We organized it into two sections: community initiatives and cookbook initiatives, to provide an overview of achievements and tasks that everyone can contribute.

Community initiatives

The first step in this initiative focused on the prompt ranking project. Our goal was to create a dataset of synthetic and human-generated 10K prompts ranked by quality. The community response was immediate!

A few days later, over 385 people participated. We have released the DIBT/10K_PROMPTS_RANKED dataset for prompt ranking tasks or synthetic data generation. The dataset was used to build new models such as Spin.

Looking at global support from the community, we realized that English-centered data alone is not enough, and that open LLM does not have enough language-specific benchmarks. So, we created the Multilingual Prompt Evaluation Project (MPEP) with the aim of developing leaderboards for multiple languages. To that end, a subset of 500 high quality prompts from DIBT/10K_PROMPTS_RANKED was chosen to be translated into different languages.

Over 18 language leaders have created spaces for translation. Complete Dutch, Russian or Spanish translations, and more efforts are working towards a full translation of the prompt. Creating a Dataset Builder Community in Discord

We will continue to support our community’s efforts in the future, focusing on building datasets through tools and documentation.

Cookbook effort

As part of DIBT, we also created guides and tools to help the community build valuable datasets themselves.

Domain-specific datasets: Bootstraps the creation of more domain-specific datasets for training models, bringing together engineers and domain experts. DPO/ORPO datasets: help to nurture a community of people building more DPO-style datasets for different languages, domains, and tasks. KTO dataset: To allow communities to create their own KTO datasets.

What did we learn?

The community is eager to participate in these efforts and is excited to work collectively on the dataset. There are existing inequalities that must be overcome to ensure an inclusive and comprehensive benchmark. Currently, the open source community is underestimating datasets for specific languages, domains, and tasks. There are many of the tools that the community needs to effectively cooperate in building valuable datasets.

How can I get involved?

Follow the directions in the Readme for your project of interest to share datasets and results with the community, and contribute to your cookbook efforts by providing new guides and tools for everyone. Your contributions are invaluable to help us build robust and inclusive resources for everyone.

If you want to participate in it, join us on the #Data-IS-Better-Together channel of embracing face mismatches.

We look forward to building a better dataset with you!

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSmall businesses are embracing AI for quick productive wins, research finds
Next Article Dewan Rakyat speakers need new laws to tackle the delinquent losses arising from AI misuse
versatileai

Related Posts

Tools

Introducing training clusters as a service

June 12, 2025
Tools

Mistral AI challenges big technology with inference models

June 11, 2025
Tools

Introducing the LiveCodeBench Leaderboard – Overall and Contaminated Assessment of Code LLM

June 11, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Deepseek’s latest AI model is a “big step back” for free speech

May 31, 20255 Views

Doudna Supercomputer to Strengthen AI and Genomics Research

May 30, 20255 Views

From California to Kentucky: Tracking the rise of state AI laws in 2025 | White & Case LLP

May 29, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Deepseek’s latest AI model is a “big step back” for free speech

May 31, 20255 Views

Doudna Supercomputer to Strengthen AI and Genomics Research

May 30, 20255 Views

From California to Kentucky: Tracking the rise of state AI laws in 2025 | White & Case LLP

May 29, 20255 Views
Don't Miss

Introducing training clusters as a service

June 12, 2025

Qualcomm (QCOM) expands AI research at new centres in Vietnam

June 11, 2025

What is AI Cleaning and Why Businesses Should Stop Exaggerating AI Proficiency | Articles

June 11, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?