Falcon 2 model
TII has launched a new generation of model Falcon 2, focusing on providing the open source community with a set of small models with enhanced performance and multimodal support. Our goal is to enable cheaper inference, improve usability, and encourage the development of downstream applications.
The first generation of the Falcon model, featuring the Falcon-40B and Falcon-180B, contributed greatly to the open source community and facilitated the release of advanced LLMS with acceptable licenses. For more information on previous generations of the Falcon model, see RefinedWeb, Penedo et al. , 2023, and Open Language Models’ Falcon series, Almazrouei et al. , 2023 Papers, and in Falcon and Falcon-180B blog posts.
The second generation of the model focuses on improving usability and integration, creating a multimodal ecosystem. This journey will release not only the base 11B LLM, but also the 11B VLM model with image understanding functionality. The Vision-Language model (VLM) allows users to join chats about visual content using text.
Like previous work, the model offers support primarily in English, but has excellent abilities in 10 other languages, including Spanish, French and German.
table of contents
FALCON2-11B LLM
Training data
FALCON2-11B was trained with 5,000 GT (1 billion tokens) RefinedWeb, a high-quality filtered, deduplication web dataset enhanced in a curated corpus. I followed a four-stage training strategy. The first three stages focused on increasing context length from 2048 to 4096 and ultimately 8192 tokens. The final stage is aimed at further enhancing performance using only high quality data.
Overall, the data sources included RefinedWeb-English, RefinedWeb-Europe (CS, DE, ES, FR, IT, NL, PL, PT, RO, SV), high quality technical data, code data, and conversational data extracted from public sources.
The training stages were as follows:
Stage context length GT Stage 1 2048 4500 Stage 2 4096 250 Stage 3 8192 250 Stage 4 8192 500
The data was tokenized with the Falcon2-11B talknaser, the same tokenizer as the previous Falcon model.
Model Architecture
The following table summarizes some of the important details about the model architecture.
Design Selection Value Number of Transformer Blocks 60 Number of Query Heads 32 Keys/Value Heads 8 Head Dimensions
Training Procedure
FALCON2-11B was trained on a 1024 A100 40GB GPU for the majority of the training using a 3D parallelism strategy (TP = 8, PP = 1, DP = 128).
Training Hyper Parameters
Hyperparameter Value Precision BFLOAT16 Optimizer ADAMW MAX LR 3.7E-4 MIN LR 1.89E-5 LR Schedule COS Attenuation (Stage 1) Context Length 8192 (Stage 3 and 4) Weight Attenuation 1E-1 Z-LOSS 1E-4 Batch Size Variable Variable
FALCON2-11B review
English performance
Open LLM Leaderboard Task Performance:
Checkpoint GT Hellaswag-10 Winogrande-5 ArcChallenge-25 Truthfulqa-0 MMLU-5 GSMK8K-5 Average FALCON2-11B 5500 82.91 78.30 59.73 52.56 58.37 53.83 64.28 FALCON-40B 1000 85.28 81.29 61.86 21.46 58.07 FALCON-7B 1500 78.13 72.38 47.87 34.26 27.79 4.62 44.17 GEMMA-7B 6000 82.47 78.45 61.09 44.91 66.03 52.77 64.29 LLAMA3-8B 15.43.09 77.35.43.43.43.43.43.09 77.09 77.09 66.69 44.79 62.38 MISTRAL-7B N/A 83.31 78.37 59.98 42.15 64.16 37.83 60.97
The Embracing Face Leaderboard Team provided an official rating of the model on Open LLM Leaderboard Tasks. This model performs better than models like the Llama3-8B (trained with 3x data) and the Mistral-7B, and is comparable to the Gemma-7B.
Zero Shot Performance:
Checkpoint GT Hellaswag Arceasy Winogrande Arcchallenge Falcon2-11b 5500 82.07 77.78 78.30 50.17 Falcon-40b 1000 82.82 81.86 76.4 54.69 Falcon-7B 1500 76.31 74.74 67.17 43.43
The results of the evaluation show that the Falcon2-11b performs similarly to the Falcon-40B at four times the Model size.
Multilingual features
Compare the Falcon2-11b model with the Llama-7B and Bloom-7B using a multilingual LLM leaderboard. For reference, we also include Falcon-40B (supports the same language), Falcon-7B (supports French), and Mistral-7B.
Model Language ID arcchallenge-25 ellaswag mmlu 25 tqa average Falcon2-11b de 43.7 67.96 38.3 47.53 49.37 es 46.2 73.63 37.9 46.43 51.06 fr 45.8 72.41 39.53 47.30 50.42 NL 41.7 69.05 38.29 48.81 49.47 RO 42.4 66.24 38.01 45.53 48.04 FALCON-40B DE 45.1 68.3 36.2 39.8 47.4 ES 48.5 73.9 37.2 39.0 49.6 FR 47.6 72.9 37.3 46.3 70.2 36.4 40.7 48.4 NL 42.9 68.4 36.5 40.9 47.1 RO 43.2 66.0 35.7 39.8 46.2 Falcon-7B FR 37.3 64.1 28.4 34.0 40.9 Misttral-7B DE 41.2 58.7 40.5 44.9 44.3 44.3 44.3 43.1 48.7 FR 44.9 64.4 41.9 43.0 48.6 IT 43.2 60.9 39.7 43.1 46.7 NL 40.0 57.9 41.4 43.3 45.7 RO 40.7 53.6 39.3 43.6 44.3 LLAMA-7B DE 35.1 49.9 29.9 38.3 38.3 es 37.0 40.1 FR 37.3 55.7 30.5 39.9 40.9 IT 35.8 52.0 29.9 39.6 39.3 NL 33.6 48.7 29.8 40.0 38.0 RO 32.4 44.9 29.7 37.0 36.0 37.0 36.0 Bloom-7B DE 26.3 32.4 28.1 43.7 32.6 41.0 FR 36.7 56.6 29.9 40.9 41.0 IT 29.0 40.8 27.6 43.7 35.3 NL 23.1 31.7 27.5 42.7 31.3 RO 26.9 31.8 27.4 46.1 33.1 33.1
In the spirit of the original Falcon model, Falcon2-11b was trained not only in English data but also in 10 other languages. The results of the multilingual evaluation show that this model presents excellent features in the six languages featured on the Multilingual LLM leaderboard (DE, ES, FR, IT, NL, RO) and actually performs better than Falcon-40B and several other multilingual models in all citation languages.
We will soon release more extensive evaluation results for the multilingual features of the FALCON2-11B model card!
Code Generation Function
Check the performance of the model on code generation for Humanval Benchmark’s BigCode leaderboard in Python language and get a 29.59% pass @1.
Using FALCON2-11B
from transformer Import Auto Token Iser
Import transformer
Import Torch model = “Tiiuae/Falcon-11b”
Tokenizer = autotokenizer.from_pretrained(model)pipeline = transformers.pipeline(
“Text Generation”model = model, torch_dtype = torch.bfloat16, device_map =“Auto”,)
Then run the text generation using code like this:
Sequence = pipeline (
“Can you explain the concept of quantum computing?”,max_length =200do_sample =truthtop_k =10num_return_sequences =1,eos_token_id = tokenizer.eos_token_id,)
for seq in Sequence:
printing(f “result: {seq(‘Generated_text’)}“))
FALCON2-11B VLM
FALCON2-11B VLM is a vision language model (VLM) built on top of LLM that can also process image input and answer queries about images. To achieve this, the previous Clip VIT-L/14 Vision encoder is integrated with the FALCON2-11B Chat-Finetuned model and trained with image text data.
To enhance the perception of VLMs about fine details within images, it employs a dynamic encoding mechanism at high resolution of image input, similar to Llava-Next.
training
Training takes place in two stages: pre-training and finning. At both stages, the visual encoder weights remain frozen. During the pre-training phase, the LLM remains frozen, with only the multimodal projector being trained with a 558K image caption pair. This allows multimodal projectors to learn mappings from visual to text embedding space. During Finetuning, both the projector and LLM weights are trained on a corpus of 1.2m image text instruction data from the public data set. This includes multi-round conversations.
FALCON2-11B VLM evaluation
Model MME GQA SQA SQA POPE VQAV2 TEXTVQA MM-BENCH SEED-IMG Average FALCON2-11B VLM 1589/343 64.5 74.9 88.4 82.1 66.7 72.0 64.9 67.4 70.2 72.1 LLAVA-1.6 (VICUNA-13B)1575/326 65.4 73.6 86.2 82.8 67.1 70.0 71.9 73.8 LLAVA-1.6 (Mistral-7B)1498/321 64.8 72.8 86.7 85.7 85.7 85.7 73.3
Use FALCON2-11B-FALCONVLM
from transformer Import llavanextforconditionalgeneration, llavanextprocessor
from pill Import image
Import request
Import Torch Processor = llavanextProcessor.from_pretrained(“Tiiuae/Falcon-11b-vlm”) model = llavanextforconditionalgeneration.from_pretrained(“Tiiuae/Falcon-11b-vlm”torch_dtype = torch.bfloat16)url = “https://merzougabirding.com/wp-content/uploads/2023/09/falcon-size.jpg”
falcon_image = image.open(requests.get(url, stream=truth).RAW)PROMPT = “User:\nWhat’s special about this bird’s vision?”
inputs = processor(prompt, images = falcon_image, return_tensors =“PT”padding =truth). In (‘cuda: 0’)model.o(‘cuda: 0’)output = model.generate(** inputs, max_new_tokens =256)prompt_length = inputs(‘input_ids’). shape(1generated_captions = processor.decode(output(0), skip_special_tokens =truth). strip()
printing(generated_captions)
License Information
The FALCON 2 model will be available under the TII Falcon 2 license, an acceptable Apache 2.0-based software license, which includes an acceptable usage policy that promotes responsible use of AI. This license was created within the spirit of TII for the open source community.