Falcon3 is a family of large language models dedicated to 10 billion parameter decoders developed by the Technology Innovation Institute (TII) in Abu Dhabi. This release reflects our continued commitment to evolving open, accessible, and at-scale underlying models by pushing the boundaries of performance and training efficiency.
Falcon3 represents a natural evolution from previous releases and emphasizes enhancements to the model’s science, mathematics, and code capabilities.
This iteration includes five basic models:
Falcon3-1B-Base Falcon3-3B-Base Falcon3-Mamba-7B-Base Falcon3-7B-Base Falcon3-10B-Base
While developing these models, we incorporated several key innovations aimed at improving model performance while reducing training costs.
One-time pre-training of transformer-based models: 7B using 1024 H100 GPU chips and leveraging 14 trillion tokens featuring web, code, STEM, and curated high-quality multilingual data We performed one large pre-training run on the model. Depth upscaling to improve inference: Based on recent research on the effects of model depth, the 7B model was improved by replicating redundant layers and continuing pre-training with 2 trillion tokens of high-quality data. was upscaled to a 10B parameter model. This resulted in Falcon3-10B-Base, which delivers state-of-the-art zero-shot and few-shot performance for models under 13B parameters. Knowledge distillation for better, smaller models: Leverage pruning and knowledge distillation techniques using hand-picked, high-quality data below 100 GT to provide a compact and efficient alternative. We developed Falcon3-1B-Base and Falcon3-3B-Base by redefining the predefined data. Training efficiency. Pure SSM: We further power Falcon Mamba 7B by training it with an additional 1.5 trillion tokens of high-quality data, resulting in Falcon3-Mamba-7B-Base. In particular, the updated model has significantly improved inference and mathematical capabilities. Other variants: All models in the Falcon3 family are available in variants such as Instruct, GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and 1.58-bit, providing flexibility for a wide range of applications.
Main highlights
Falcon3 overcomes the small and medium-sized limitations of large language models by demonstrating high performance on common benchmarks.
Falcon3-1B-Base outperforms SmolLM2-1.7B and is equivalent to gemma-2-2b. Falcon3-3B-Base shows better performance than larger models such as Llama-3.1-8B and Minitron-4B-Base, highlighting the benefits of pre-training with knowledge distillation. Falcon3-7B-Base exhibits the same top performance as Qwen2.5-7B among models below 9B scale. Falcon3-10B-Base is a cutting-edge product that performs well in the under 13B category. All trans-based Falcon3 models are compatible with Llama architecture for enhanced integration into AI ecosystems. Falcon3-Mamba-7B continues to lead as the highest-performing state-space language model (SSLM), matching or even outperforming leading transformer-based LLMs at 7B scale and with longer 32K contexts. length is also supported. Featuring the same architecture as the original Falcon Mamba 7B, users can seamlessly integrate Falcon3-Mamba-7B without any additional effort. Furthermore, the instruction versions of the collection of base models show remarkable performance across a variety of benchmarks, with Falcon3-7B-Instruct and Falcon3-10B-Instruct outperforming all instruction models below 13B scale on open leaderboards. .
Enhanced features
Evaluate the model using an internal evaluation pipeline (based on lm-evaluation-harness) and report the raw score. Our evaluation highlights key areas where the Falcon3 family of models excels, with a focus on improving performance in scientific domains, reasoning, and general knowledge abilities.
Mathematics proficiency: Falcon3-10B-Base achieves 22.9 in MATH-Lvl5 and 83.0 in GSM8K, demonstrating enhanced reasoning on complex math-focused tasks. Coding proficiency: Falcon3-10B-Base achieved a score of 73.8 on MBPP and Falcon3-10B-Instruct achieved a score of 45.8 on Multipl-E, reflecting its ability to generalize across programming-related tasks. Extended context length: The Falcon3 family of models supports up to 32,000 tokens (except 1B, which supports up to 8,000 contexts) and has improved features, including a score of 86.3 on BFCL (Falcon3-10B-Instruct). It’s improving. Improved inference: Falcon3-7B-Base and Falcon3-10B-Base achieved BBH of 51.0 and 59.7, reflecting enhanced inference capabilities, with the 10B model delivering improved inference performance over 7B. Expand your scientific knowledge: MMLU benchmark performance is 67.4/39.2 (MMLU/MMLU-PRO) for Falcon3-7B-Base and 73.1/42.5 (MMLU/MMLU-PRO) for Falcon3-10B-Base. , demonstrating progress in expertise. each.
Model specifications and benchmark results
The detailed specifications of the Falcon3 family models are summarized in the table below. The Falcon3-7B-Base architecture features a 256 head dimension and is optimized for this dimension, resulting in high throughput when using FlashAttendant-3. These decoder-specific models range from 18 to 40 layers for the transformer-based version and 64 layers for the Mamba model, all models share SwiGLU activation functionality, and have a vocabulary size of 131K tokens (65K for Mamba-7B) . Falcon3-7B-Base is trained on the largest amount of data for comprehensive coverage of concepts and knowledge, while other variants require much less data.
The table below shows the performance of Falcon3-7B-Base and Falcon3-10B-Base on key benchmarks and shows competitive performance in the areas of General, Mathematics, Reasoning, and Common Sense Understanding. Feel free to look at the cards for models that provide additional evaluation results (e.g. MT-Bench, Alpaca, etc.).

The instruction model also exhibits competitively superior performance compared to comparable smaller models, as shown in the table below.
instruct the model
Falcon3-1B-Instruct and Falcon3-3B-Instruct deliver robust performance across the evaluated benchmarks. Specifically, Falcon3-1B achieved competitive results on IFEval (54.4), MUSR (40.7), and SciQ (86.8), and Falcon3-3B achieved further gains especially on MMLU-PRO (29.7) and MATH (19.9). It shows improvement and shows clear scaling. effect. Although it does not outperform competing models on all metrics, the Falcon model shows superior performance in reasoning and common sense understanding compared to both Qwen and Llama. The internal evaluation pipeline looks like this:
lm – use evaluation harness. Reports the raw score obtained by applying the chat template without using fourshot_as_multiturn (unlike Llama3.1). Use the same batch size for all models.

Additionally, Falcon3-7B and Falcon3-10B exhibit robust performance across the evaluated benchmarks. While the Falcon3-7B achieved competitive scores in Reasoning (Arc Challenge: 65.9, MUSR: 46.4) and Mathematics (GSM8K: 79.1), the Falcon3-10B achieved further gains especially in GSM8K (83.1) and IFEval (78). showing improvement and clear scaling benefits. .

Open source initiatives
In line with our mission to advance AI accessibility and collaboration, all models in the Falcon3 family are released under the Falcon LLM license. We hope that the AI community finds these models valuable for research, application development, and further experimentation. Falcon3 is not a culmination, but a continuation of our efforts to create a more capable, efficient, and specialized foundation model. In January 2025, we plan to release further models in the Falcon3 family with enhanced multimodal capabilities including image, video and audio support, as well as a full technical report covering our methodology. We welcome feedback and collaboration from the community as we continue to improve and advance these technologies.
Useful links
Acknowledgment
We would like to sincerely thank the following people for their smooth support and integration within the ecosystem:
quotation
If the Falcon3 family of models was useful for your work, please feel free to quote.
@misc{Falcon3, Title = {Falcon 3 Open Model Family}, URL = {https://huggingface.co/blog/falcon3}, Author = {Falcon-LLM Team}, Month = {December}, Year = { 2024} }