Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Gemma 3N is fully available in the open source ecosystem!

June 27, 2025

Professor UAB builds user-friendly tools to find hidden AI security threats

June 26, 2025

Major AI Chatbot Parrot CCP Propaganda

June 26, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, June 27
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Prerequisite model and VLM for over 5000B tokens and 11B parameters trained in 11 languages
Tools

Prerequisite model and VLM for over 5000B tokens and 11B parameters trained in 11 languages

versatileaiBy versatileaiMay 4, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

Falcon 2 model

TII has launched a new generation of model Falcon 2, focusing on providing the open source community with a set of small models with enhanced performance and multimodal support. Our goal is to enable cheaper inference, improve usability, and encourage the development of downstream applications.

The first generation of the Falcon model, featuring the Falcon-40B and Falcon-180B, contributed greatly to the open source community and facilitated the release of advanced LLMS with acceptable licenses. For more information on previous generations of the Falcon model, see RefinedWeb, Penedo et al. , 2023, and Open Language Models’ Falcon series, Almazrouei et al. , 2023 Papers, and in Falcon and Falcon-180B blog posts.

The second generation of the model focuses on improving usability and integration, creating a multimodal ecosystem. This journey will release not only the base 11B LLM, but also the 11B VLM model with image understanding functionality. The Vision-Language model (VLM) allows users to join chats about visual content using text.

Like previous work, the model offers support primarily in English, but has excellent abilities in 10 other languages, including Spanish, French and German.

table of contents

FALCON2-11B LLM

Training data

FALCON2-11B was trained with 5,000 GT (1 billion tokens) RefinedWeb, a high-quality filtered, deduplication web dataset enhanced in a curated corpus. I followed a four-stage training strategy. The first three stages focused on increasing context length from 2048 to 4096 and ultimately 8192 tokens. The final stage is aimed at further enhancing performance using only high quality data.

Overall, the data sources included RefinedWeb-English, RefinedWeb-Europe (CS, DE, ES, FR, IT, NL, PL, PT, RO, SV), high quality technical data, code data, and conversational data extracted from public sources.

The training stages were as follows:

Stage context length GT Stage 1 2048 4500 Stage 2 4096 250 Stage 3 8192 250 Stage 4 8192 500

The data was tokenized with the Falcon2-11B talknaser, the same tokenizer as the previous Falcon model.

Model Architecture

The following table summarizes some of the important details about the model architecture.

Design Selection Value Number of Transformer Blocks 60 Number of Query Heads 32 Keys/Value Heads 8 Head Dimensions

Training Procedure

FALCON2-11B was trained on a 1024 A100 40GB GPU for the majority of the training using a 3D parallelism strategy (TP = 8, PP = 1, DP = 128).

Training Hyper Parameters

Hyperparameter Value Precision BFLOAT16 Optimizer ADAMW MAX LR 3.7E-4 MIN LR 1.89E-5 LR Schedule COS Attenuation (Stage 1) Context Length 8192 (Stage 3 and 4) Weight Attenuation 1E-1 Z-LOSS 1E-4 Batch Size Variable Variable

FALCON2-11B review

English performance

Open LLM Leaderboard Task Performance:

Checkpoint GT Hellaswag-10 Winogrande-5 ArcChallenge-25 Truthfulqa-0 MMLU-5 GSMK8K-5 Average FALCON2-11B 5500 82.91 78.30 59.73 52.56 58.37 53.83 64.28 FALCON-40B 1000 85.28 81.29 61.86 21.46 58.07 FALCON-7B 1500 78.13 72.38 47.87 34.26 27.79 4.62 44.17 GEMMA-7B 6000 82.47 78.45 61.09 44.91 66.03 52.77 64.29 LLAMA3-8B 15.43.09 77.35.43.43.43.43.43.09 77.09 77.09 66.69 44.79 62.38 MISTRAL-7B N/A 83.31 78.37 59.98 42.15 64.16 37.83 60.97

The Embracing Face Leaderboard Team provided an official rating of the model on Open LLM Leaderboard Tasks. This model performs better than models like the Llama3-8B (trained with 3x data) and the Mistral-7B, and is comparable to the Gemma-7B.

Zero Shot Performance:

Checkpoint GT Hellaswag Arceasy Winogrande Arcchallenge Falcon2-11b 5500 82.07 77.78 78.30 50.17 Falcon-40b 1000 82.82 81.86 76.4 54.69 Falcon-7B 1500 76.31 74.74 67.17 43.43

The results of the evaluation show that the Falcon2-11b performs similarly to the Falcon-40B at four times the Model size.

Multilingual features

Compare the Falcon2-11b model with the Llama-7B and Bloom-7B using a multilingual LLM leaderboard. For reference, we also include Falcon-40B (supports the same language), Falcon-7B (supports French), and Mistral-7B.

Model Language ID arcchallenge-25 ellaswag mmlu 25 tqa average Falcon2-11b de 43.7 67.96 38.3 47.53 49.37 es 46.2 73.63 37.9 46.43 51.06 fr 45.8 72.41 39.53 47.30 50.42 NL 41.7 69.05 38.29 48.81 49.47 RO 42.4 66.24 38.01 45.53 48.04 FALCON-40B DE 45.1 68.3 36.2 39.8 47.4 ES 48.5 73.9 37.2 39.0 49.6 FR 47.6 72.9 37.3 46.3 70.2 36.4 40.7 48.4 NL 42.9 68.4 36.5 40.9 47.1 RO 43.2 66.0 35.7 39.8 46.2 Falcon-7B FR 37.3 64.1 28.4 34.0 40.9 Misttral-7B DE 41.2 58.7 40.5 44.9 44.3 44.3 44.3 43.1 48.7 FR 44.9 64.4 41.9 43.0 48.6 IT 43.2 60.9 39.7 43.1 46.7 NL 40.0 57.9 41.4 43.3 45.7 RO 40.7 53.6 39.3 43.6 44.3 LLAMA-7B DE 35.1 49.9 29.9 38.3 38.3 es 37.0 40.1 FR 37.3 55.7 30.5 39.9 40.9 IT 35.8 52.0 29.9 39.6 39.3 NL 33.6 48.7 29.8 40.0 38.0 RO 32.4 44.9 29.7 37.0 36.0 37.0 36.0 Bloom-7B DE 26.3 32.4 28.1 43.7 32.6 41.0 FR 36.7 56.6 29.9 40.9 41.0 IT 29.0 40.8 27.6 43.7 35.3 NL 23.1 31.7 27.5 42.7 31.3 RO 26.9 31.8 27.4 46.1 33.1 33.1

In the spirit of the original Falcon model, Falcon2-11b was trained not only in English data but also in 10 other languages. The results of the multilingual evaluation show that this model presents excellent features in the six languages ​​featured on the Multilingual LLM leaderboard (DE, ES, FR, IT, NL, RO) and actually performs better than Falcon-40B and several other multilingual models in all citation languages.

We will soon release more extensive evaluation results for the multilingual features of the FALCON2-11B model card!

Code Generation Function

Check the performance of the model on code generation for Humanval Benchmark’s BigCode leaderboard in Python language and get a 29.59% pass @1.

Using FALCON2-11B

from transformer Import Auto Token Iser
Import transformer
Import Torch model = “Tiiuae/Falcon-11b”

Tokenizer = autotokenizer.from_pretrained(model)pipeline = transformers.pipeline(
“Text Generation”model = model, torch_dtype = torch.bfloat16, device_map =“Auto”,)

Then run the text generation using code like this:

Sequence = pipeline (
“Can you explain the concept of quantum computing?”,max_length =200do_sample =truthtop_k =10num_return_sequences =1,eos_token_id = tokenizer.eos_token_id,)
for seq in Sequence:
printing(f “result: {seq(‘Generated_text’)}“))

FALCON2-11B VLM

FALCON2-11B VLM is a vision language model (VLM) built on top of LLM that can also process image input and answer queries about images. To achieve this, the previous Clip VIT-L/14 Vision encoder is integrated with the FALCON2-11B Chat-Finetuned model and trained with image text data.

To enhance the perception of VLMs about fine details within images, it employs a dynamic encoding mechanism at high resolution of image input, similar to Llava-Next.

training

Training takes place in two stages: pre-training and finning. At both stages, the visual encoder weights remain frozen. During the pre-training phase, the LLM remains frozen, with only the multimodal projector being trained with a 558K image caption pair. This allows multimodal projectors to learn mappings from visual to text embedding space. During Finetuning, both the projector and LLM weights are trained on a corpus of 1.2m image text instruction data from the public data set. This includes multi-round conversations.

FALCON2-11B VLM evaluation

Model MME GQA SQA SQA POPE VQAV2 TEXTVQA MM-BENCH SEED-IMG Average FALCON2-11B VLM 1589/343 64.5 74.9 88.4 82.1 66.7 72.0 64.9 67.4 70.2 72.1 LLAVA-1.6 (VICUNA-13B)1575/326 65.4 73.6 86.2 82.8 67.1 70.0 71.9 73.8 LLAVA-1.6 (Mistral-7B)1498/321 64.8 72.8 86.7 85.7 85.7 85.7 73.3

Use FALCON2-11B-FALCONVLM

from transformer Import llavanextforconditionalgeneration, llavanextprocessor
from pill Import image
Import request
Import Torch Processor = llavanextProcessor.from_pretrained(“Tiiuae/Falcon-11b-vlm”) model = llavanextforconditionalgeneration.from_pretrained(“Tiiuae/Falcon-11b-vlm”torch_dtype = torch.bfloat16)url = “https://merzougabirding.com/wp-content/uploads/2023/09/falcon-size.jpg”
falcon_image = image.open(requests.get(url, stream=truth).RAW)PROMPT = “User:\nWhat’s special about this bird’s vision?”

inputs = processor(prompt, images = falcon_image, return_tensors =“PT”padding =truth). In (‘cuda: 0’)model.o(‘cuda: 0’)output = model.generate(** inputs, max_new_tokens =256)prompt_length = inputs(‘input_ids’). shape(1generated_captions = processor.decode(output(0), skip_special_tokens =truth). strip()

printing(generated_captions)

License Information

The FALCON 2 model will be available under the TII Falcon 2 license, an acceptable Apache 2.0-based software license, which includes an acceptable usage policy that promotes responsible use of AI. This license was created within the spirit of TII for the open source community.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI Watch: Global Regulation Tracker – Kenya (Updated) | White & Case LLP
Next Article Cyberseceval 2 – A comprehensive assessment framework for cybersecurity risks and capabilities of large-scale language models
versatileai

Related Posts

Tools

Gemma 3N is fully available in the open source ecosystem!

June 27, 2025
Tools

Major AI Chatbot Parrot CCP Propaganda

June 26, 2025
Tools

Introducing Chatbot Guardrails Arena

June 26, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

How to build an MCP server with Gradio

April 30, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

How to build an MCP server with Gradio

April 30, 20251 Views
Don't Miss

Gemma 3N is fully available in the open source ecosystem!

June 27, 2025

Professor UAB builds user-friendly tools to find hidden AI security threats

June 26, 2025

Major AI Chatbot Parrot CCP Propaganda

June 26, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?