Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

AI Act: Statewide Spotlight – Regulatory Surveillance Podcast | Troutman Pepperlock

May 9, 2025

Hugging your face with AMD Instinct MI300 GPU

May 9, 2025

Will AI apps help carry the mental load of moms?

May 8, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, May 9
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»ðŸ¤— Open configuration dataset for community-driven text-to-image generation
Tools

🤗 Open configuration dataset for community-driven text-to-image generation

By December 9, 2024Updated:February 13, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

The Data is Better Together community releases yet another important dataset for open source development. Due to the lack of open configuration datasets for text-to-image generation, we set out to release an Apache 2.0 licensed dataset for text-to-image generation. This dataset focuses on text-image preference pairs across common image generation categories, mixing different model families and varying prompt complexity.

TL;DR? All results can be found in this collection on Hugging Face Hub, and pre- and post-processing code can be found in this GitHub repository. Most importantly, there is a ready-to-use configuration dataset and flux-dev-lora-finetune. If you already want to show your support, don’t forget to Like, Subscribe and Follow before continuing reading.

Not familiar with the Data is Better Together community?

(Data is Better Together)(https://huggingface.co/data-is-better-together) is a collaboration between 🤗 Hugging Face and the open source AI community. We aim to empower the open source community to collaboratively build impactful datasets. Follow organizations to stay up to date on the latest datasets, models, and community sprints.

Similar efforts

Although there have been several previous efforts to create open image settings datasets, ours is unique because of the openness of the dataset and the code that creates it, as well as the varying complexity and categories of prompts. is. Some of these efforts are listed below.

– (yuvalkirstain/pickapic_v2)(https://huggingface.co/datasets/yuvalkirstain/pickapic_v2) – (fal.ai/imgsys)(https://imgsys.org/) – (TIGER-Lab/GenAI-Arena)( https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena) – (Artificial analysis image) arena)(https://artificialanalogy.ai/text-to-image/arena)

input dataset

To get a suitable input dataset for this sprint, we started with some basic prompts, cleaned it up, filtered it for harmfulness, and injected categories and complexity using synthetic data generation with distilabel. I did. Finally, the images were generated using a flux model and a stable diffusion model. This created open-image-preferences-v1.

input prompt

Imgsys is a generative image model arena hosted by fal.ai that provides prompts and allows users to choose their preference between two model generations. Unfortunately, the generated images are not publicly available, but the associated prompts are hosted on Hugging Face. These prompts represent real-world usage of image generation, including good examples that focus on everyday generation, but this real-world usage includes duplicate and harmful prompts. That also meant I had to look at the data and do some filtering.

Reduced toxicity

We aimed to remove all NSFW prompts and images from our dataset before launching the community. We settled on a multi-model approach using two text-based classifiers and two image-based classifiers as filters. After filtering, I decided to manually check each image to ensure that no harmful content remained. Fortunately, our approach turned out to be successful.

I used the following pipeline:

Classify images as NSFW Remove all positive samples Argilla team manually reviews dataset Iterate based on review

Synthesis prompt enhancements

Because data diversity is important for data quality, we decided to enrich our dataset by synthetically rewriting prompts based on different categories and complexity. This was done using the distilabel pipeline.

Input prompt image defaults to a harp without strings

An animated stylized stringless harp with intricate details and flowing lines set against a dreamy pastel background.
stylized harp image

Stringless, anime style harp quality. It has intricate details and flowing lines, set against a dreamy pastel background and illuminated by soft golden light, with a gentle mood and rich textures, high resolution and photorealistic.
high quality harp images

prompt category

InstructGPT describes basic task categories for text-to-text generation, but there is no equivalent clear-cut task category for text-to-image generation. To alleviate this, we used two primary sources as input for categories: google/sdxl and Microsoft. This produced the following main categories: (‘Movies’, ‘Photography’, ‘Anime’, ‘Manga’, ‘Digital Art’, ‘Pixel Art’, ‘Fantasy Art’, ‘Neon Punk’, “3D model”, “painting”, “animation”, “illustration”). In addition to that, we also selected some mutually exclusive subcategories to allow for further diversification of the prompts. These categories and subcategories are randomly sampled, so they are approximately evenly distributed across the dataset.

gets complicated quickly

Data’s paper demonstrated that evolving complexity and diversity of prompts leads to better model generation and fine-tuning, but humans do not necessarily take the time to create a wide range of prompts. Therefore, we decided to use the same prompt in a complex and simplified way as two data points for generations with different preferences.

image generation

ArtificialAnalysis/Text-to-Image-Leaderboard provides an overview of the best performing image models. Choose the two best performing models based on licensing and availability at the hub. Additionally, we made the models belong to different model families to de-emphasize generations across different categories. Therefore, we chose stabilityai/stable-diffusion-3.5-large and black-forest-labs/FLUX.1-dev. Each of these models was then used to generate images for both simplified and complex prompts within the same style category.

image generation

result

The raw export of all annotated data includes responses to multiple choice, and each annotator can decide whether one model is better, both models perform better, or both models perform better. Choose what’s bad. Based on this, you should tune your annotators, check the performance of your model across categories, and also fine-tune your model. You can try this already on the hub. The following shows the annotated dataset.

Annotator alignment

Annotator agreement is a way to check the validity of a task. Whenever a task is too difficult, the annotator may be under-tuned, and whenever the task is too easy, the annotator may be over-tuned. Balancing is rare, but I managed to do so during this sprint. This analysis was performed using the Hugging Face dataset SQL console. Overall, the SD3.5-XL had a slightly better chance of winning within our test setup.

model performance

We found that both models performed better within their own range when considering annotator adjustments, so we conducted additional analyzes to see if there were any differences between the categories. In other words, FLUX-dev is suitable for anime, and SD3.5-XL is suitable for art and film scenarios.

Ties: Photography, Animation FLUX-dev is great at: 3D models, anime, manga SD3.5-XL is great at: Movies, Digital Art, Fantasy Art, Illustration, Neon Punk, Painting, Pixel Art

Fine-tuning the model

To validate the quality of the dataset, I decided to do some LoRA fine-tuning of the black-forest-labs/FLUX.1-dev model based on the Diffuser example on GitHub without spending too much time and resources. This process included selected samples as expected completion of the FLUX-dev model and excluded rejected samples. Interestingly, the selected fine-tuned model performs much better in art and film scenarios where it was initially lacking. You can test the tweaked adapter here.

Prompt Original Tweak a boat on the canals of Venice. Painted in gouache with soft, flowing brushstrokes and vibrant, translucent colors, the works are rich in texture and dynamic perspective, capturing the serene reflections on the water’s surface under a misty atmosphere.
original venice
fine tune venice

On a black background, a bright orange poppy flower surrounded by an ornate golden frame is rendered in an anime style with bold outlines, exaggerated details, and dramatic chiaroscuro lighting.
original flower
fine tune flower

Grainy shot of a robot cooking in the kitchen. Comes with soft shadows and a nostalgic film texture.
original robot
fine tune robot

community

In short, we annotated 10,000 preference pairs with 2/3 annotator overlap and received over 30,000 responses from over 250 community members within two weeks. The leaderboard in the image also shows that some community members have prioritized over 5,000. We would like to thank everyone who participated in this sprint. Specifically, the top three users will receive a one-month Hugging Face Pro membership. Be sure to follow us on the hub: aashish1904, prithivMLmods, Malalatiana.

leader board

What’s next?

After another successful community sprint, we will continue to organize community sprints in Hugging Face Hub. Be sure to follow the Data Is Better Together organization to stay informed. We also encourage community members to take action themselves and are happy to guide and re-share on social and within their organizations on the hub. You can contribute in several ways.

Join other sprints. You can propose your own sprint or request high-quality datasets. Fine-tune your model based on your preferred dataset. One idea is to do a full SFT tweak of SDXL or FLUX-schnell. Another idea is to do some DPO/ORPO tweaking. We evaluate the performance improvements of the LoRA adapter compared to the original SD3.5-XL and FLUX-dev models.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAlibaba Cloud powers AI across industries in Asia
Next Article AI achieves silver medal level in problem solving at International Mathematics Olympiad

Related Posts

Tools

Hugging your face with AMD Instinct MI300 GPU

May 9, 2025
Tools

Coding, Web Apps Using Gemini

May 8, 2025
Tools

ServiceNow bets on unified AI to solve the complexity of enterprises

May 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20255 Views

AI image generation using Flux models: WebUI Forge Quick Start

November 23, 20243 Views

Meet the Pai Changemakers – AI Partnership

February 25, 20252 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20255 Views

AI image generation using Flux models: WebUI Forge Quick Start

November 23, 20243 Views

Meet the Pai Changemakers – AI Partnership

February 25, 20252 Views
Don't Miss

AI Act: Statewide Spotlight – Regulatory Surveillance Podcast | Troutman Pepperlock

May 9, 2025

Hugging your face with AMD Instinct MI300 GPU

May 9, 2025

Will AI apps help carry the mental load of moms?

May 8, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?