Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Microsoft’s next big bet on AI: Building a humanist superintelligence

November 7, 2025

Innovative AI video generation engine that redefines creative workflows

November 7, 2025

Deploying a hugging face model using BentoML: DeepFloyd IF behavior

November 7, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Saturday, November 8
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Public replication of state-of-the-art visual language models
Tools

Public replication of state-of-the-art visual language models

versatileaiBy versatileaiNovember 5, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are pleased to release IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), an open access visual language model. IDEFICS is based on Flamingo, a state-of-the-art, yet-to-be-published visual language model originally developed by DeepMind. Similar to GPT-4, this model accepts any sequence of images and text input and produces text output. IDEFICS is built solely on publicly available data and models (LLaMA v1 and OpenCLIP) and comes in two versions: a basic version and a directed version. Each variant is available in parameter sizes of 9 billion and 80 billion.

The development of cutting-edge AI models requires greater transparency. Our goal at IDEFICS is to replicate and provide the AI ​​community with a system that rivals the capabilities of large-scale proprietary models like Flamingo. That’s why we’ve taken important steps to bring transparency to these AI systems. We used only publicly available data, provided tools to explore training datasets, shared technical lessons and mistakes in building such artifacts, and evaluated models for harmfulness by presenting adversarial prompts before release. We expect IDEFICS to serve as a solid foundation for more open research in multimodal AI systems, alongside models like OpenFlamingo (another open reproduction of Flamingo at the 9 billion parameter scale).

Try out the demos and models on the Hub.

What is Idefix?

IDEFICS is an 80 billion parameter multimodal model that accepts a set of images and text as input and produces consistent text as output. You can answer questions about images, describe visual content, and create stories based on multiple images.

IDEFICS is an open-access reproduction of Flamingo that performs on par with the original closed-source model across a variety of image-text understanding benchmarks. There are two variations of this: 80 billion parameters and 9 billion parameters.

Plot comparing performance of Flamingo, OpenFlamingo, and IDEFICS

We also provide versions idefics-80B-instruct and idefics-9B-instruct that are fine-tuned for conversational use cases.

training data

IDEFICS was trained on a mixture of public datasets such as Wikipedia, Public Multimodal Dataset, LAION, and a new 115B token dataset called OBELICS that we created. OBELICS is a collection of 141 million interlaced images and text documents from around the web, containing 353 million images.

Provides an interactive visualization of OBELICS that allows you to explore the contents of your dataset using Nomic AI.

Interactive visualization of OBELICS

Information about IDEFICS’ architecture, training methodology, evaluation details, and datasets are available in the model card and research paper. Additionally, we documented the technical insights and learnings gained from training the model, providing valuable perspective on the development of IDEFICS.

ethical assessment

At the beginning of this project, through a series of discussions, we developed an ethics charter to help guide decisions made during the project. This Charter sets out values ​​such as being self-critical, transparent and fair, which we have pursued when working on projects and model releases.

As part of the release process, we internally evaluated the model for potential bias by adversarially prompting the model with images and text that could elicit undesirable responses from the model (a process known as red teaming).

Try IDEFICS with a demo, check out the supported model and dataset cards, and give us your feedback using the community tab. We are working to improve these models and make large-scale multimodal AI models accessible to the machine learning community.

license

This model is built on two pre-trained models: laion/CLIP-ViT-H-14-laion2B-s32B-b79K and Huggyllama/llama-65b. The first was released under the MIT license, and the second was released under a specific non-commercial license focused on research purposes. Therefore, users must comply with that license by applying directly to the Meta form.

The two pre-trained models are connected together using the newly initialized parameters to train. They are not based on either of the two basic Freeze models forming a composite model. Release additional weights trained under the MIT license.

Get started with IDEFICS

IDEFICS models are available on Hugging Face Hub and supported by the latest versions of Transformers. Here’s a code sample to try this out.

import torch
from transformer import IdeficsForVisionText2Text, AutoProcessor device = “Cuda” if torch.cuda.is_available() Other than that “CPU”

Checkpoint = “HuggingFaceM4/idefics-9b-instruct”
Model = IdeficsForVisionText2Text.from_pretrained(checkpoint, torch_dtype=torch.bfloat16).to(device) Processor = AutoProcessor.from_pretrained(checkpoint) Prompt = ( (
“User: What is in this image?”,
“https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG”,
“”,

“\nAssistant: This photo depicts Asterix and Obelix’s dog, Idefix. Idefix is ​​running on the ground.”,

“\nUser:”,
“https://static.wikia.nocookie.net/asterix/images/2/25/R22b.gif/revision/latest?cb=20110815073052”,
“So who is that?”,

“\nAssistant:”), ) inputs =processor(prompts, add_end_of_utterance_token=errorreturn_tensors=“pt”).to(device) exit_condition =processor.tokenizer(“”add_special_tokens=error).input_ids bad_words_ids =processor.tokenizer((“”, “”), add_special_tokens=error).input_ids generated_ids = model.generate(**inputs, eos_token_id=exit_condition, bad_words_ids=bad_words_ids, max_length=100) generated_text =processor.batch_decode(generated_ids, Skip_special_tokens=truth)
for that in enumerate(generated text):
print(debt”{I}:\n{t}\n”)

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleFrom static art to moving faces: The magic of AI art generator and AI video face swap
Next Article Toronto Beacon Software raises $250 million to accelerate AI rollup strategy
versatileai

Related Posts

Tools

Microsoft’s next big bet on AI: Building a humanist superintelligence

November 7, 2025
Tools

Deploying a hugging face model using BentoML: DeepFloyd IF behavior

November 7, 2025
Tools

Is AI in a bubble? Success despite market correction

November 6, 2025
Add A Comment

Comments are closed.

Top Posts

Samsung Semiconductor Recovery: Explaining the recovery in Q3 2025

November 2, 20256 Views

UK companies are ahead of their EU competitors in AI races

March 14, 20255 Views

AI helps researchers discover new structural materials

February 28, 20254 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Samsung Semiconductor Recovery: Explaining the recovery in Q3 2025

November 2, 20256 Views

UK companies are ahead of their EU competitors in AI races

March 14, 20255 Views

AI helps researchers discover new structural materials

February 28, 20254 Views
Don't Miss

Microsoft’s next big bet on AI: Building a humanist superintelligence

November 7, 2025

Innovative AI video generation engine that redefines creative workflows

November 7, 2025

Deploying a hugging face model using BentoML: DeepFloyd IF behavior

November 7, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?