Two years after the advent of diffusion-based image generators, AI image models achieved almost photographic quality. How are these models compared? Are open source alternatives comparable to their own counterparts?
Artificial analytical text on image leaderboards aims to answer these questions with human preference-based rankings. ELO scores are informed by over 45,000 human image settings collected in the field of artificially analytical images. The leaderboard features major open source and unique image models. The latest version of Mid Journey, Openai’s Dall.E, stable spread, playgrounds and more.
Check out the leaderboard here: https://huggingface.co/spaces/artificialanalysis/text-to-image-leaderboard
You can also join the Text Two Image Arena to get a ranking of personalized models after 30 votes.
Methodology
Comparing the quality of image models has traditionally been even more difficult than assessing it with other AI modalities, such as language models. This is primarily due to the variability inherent in people’s preferences for how the image looks. As image models approached very high accuracy, early objective indicators gave way to expensive human preference studies. Our image arena represents a crowdsourcing approach to collecting human preference data on a large scale, allowing for the first time comparing key models.
As with chatbot arena, calculate the ELO score for each model via regression of all settings. Participants will be presented with a prompt and two images, and select the image that best reflects the prompt. Generate over 700 images for each model to ensure that your assessment reflects a wide range of use cases. The prompts span a variety of styles and categories, including human portraits, people, animals, nature, art, and more.
Early insights from the results 👀
While its own models are leading, open source is becoming more and more competitive. Unique models such as Midjourney, Stable Diffusion 3, and Dall/E 3 HD lead the leaderboard. However, many open source models currently led by Playground AI V2.5 have gained and surpassed the ground even in Openai’s Dall.E 3. Space is moving fast. The landscape of image generation models is evolving rapidly. Last year, Dall E 2 was a clear leader in the space. Now, Dall-E 2 is selected in arenas under 25% and is one of the lowest ranked models. Media with stable diffusion 3 medium open can have a significant impact on the community: Stable diffusion 3 is a candidate for the top position of current leaderboards and CTO of Stability AI, recently announced during a presentation with AMD. Please give a huge boost to the open source community. As we saw with stable diffusion 1.5 and SDXL, we could see many finely tuned versions released by the community.
How to contribute and how to contact
To see the leaderboard, check out the face hugging space here: https://huggingface.co/spaces/artificialanalysis/text-to-image-leaderboard
To participate in the ranking and provide your preferences, select the (Image Arena) tab and select the image that appears to best represent the prompt. After 30 images, select the Personal Leaderboard tab to see the personalized ranking of the image model based on your selection.
Follow us on Twitter and LinkedIn for updates. (Compare the model API endpoints with image models on the text speed and pricing and the https://artificialanalysis.ai/text-to-image website).
All feedback is welcome! It is available via message on Twitter and via the contact form on **our website**.
Other image model quality initiatives
Artificial analytical text on image leaderboards is not just about quality image rankings and crowdsourcing prioritization initiatives. We built a leaderboard to cover both our own and open source models, focusing on getting a comparison of image models from key text.
Check out the following for other great initiatives: