Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Gemma 3N is fully available in the open source ecosystem!

June 27, 2025

Professor UAB builds user-friendly tools to find hidden AI security threats

June 26, 2025

Major AI Chatbot Parrot CCP Propaganda

June 26, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, June 27
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Welcome to the Falcon 3 family of open models!
Tools

Welcome to the Falcon 3 family of open models!

By December 23, 2024No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Falcon3 is a family of large language models dedicated to 10 billion parameter decoders developed by the Technology Innovation Institute (TII) in Abu Dhabi. This release reflects our continued commitment to evolving open, accessible, and at-scale underlying models by pushing the boundaries of performance and training efficiency.

Falcon3 represents a natural evolution from previous releases and emphasizes enhancements to the model’s science, mathematics, and code capabilities.

This iteration includes five basic models:

Falcon3-1B-Base Falcon3-3B-Base Falcon3-Mamba-7B-Base Falcon3-7B-Base Falcon3-10B-Base

While developing these models, we incorporated several key innovations aimed at improving model performance while reducing training costs.

One-time pre-training of transformer-based models: 7B using 1024 H100 GPU chips and leveraging 14 trillion tokens featuring web, code, STEM, and curated high-quality multilingual data We performed one large pre-training run on the model. Depth upscaling to improve inference: Based on recent research on the effects of model depth, the 7B model was improved by replicating redundant layers and continuing pre-training with 2 trillion tokens of high-quality data. was upscaled to a 10B parameter model. This resulted in Falcon3-10B-Base, which delivers state-of-the-art zero-shot and few-shot performance for models under 13B parameters. Knowledge distillation for better, smaller models: Leverage pruning and knowledge distillation techniques using hand-picked, high-quality data below 100 GT to provide a compact and efficient alternative. We developed Falcon3-1B-Base and Falcon3-3B-Base by redefining the predefined data. Training efficiency. Pure SSM: We further power Falcon Mamba 7B by training it with an additional 1.5 trillion tokens of high-quality data, resulting in Falcon3-Mamba-7B-Base. In particular, the updated model has significantly improved inference and mathematical capabilities. Other variants: All models in the Falcon3 family are available in variants such as Instruct, GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and 1.58-bit, providing flexibility for a wide range of applications.

Main highlights

Falcon3 overcomes the small and medium-sized limitations of large language models by demonstrating high performance on common benchmarks.

Falcon3-1B-Base outperforms SmolLM2-1.7B and is equivalent to gemma-2-2b. Falcon3-3B-Base shows better performance than larger models such as Llama-3.1-8B and Minitron-4B-Base, highlighting the benefits of pre-training with knowledge distillation. Falcon3-7B-Base exhibits the same top performance as Qwen2.5-7B among models below 9B scale. Falcon3-10B-Base is a cutting-edge product that performs well in the under 13B category. All trans-based Falcon3 models are compatible with Llama architecture for enhanced integration into AI ecosystems. Falcon3-Mamba-7B continues to lead as the highest-performing state-space language model (SSLM), matching or even outperforming leading transformer-based LLMs at 7B scale and with longer 32K contexts. length is also supported. Featuring the same architecture as the original Falcon Mamba 7B, users can seamlessly integrate Falcon3-Mamba-7B without any additional effort. Furthermore, the instruction versions of the collection of base models show remarkable performance across a variety of benchmarks, with Falcon3-7B-Instruct and Falcon3-10B-Instruct outperforming all instruction models below 13B scale on open leaderboards. .

Enhanced features

Evaluate the model using an internal evaluation pipeline (based on lm-evaluation-harness) and report the raw score. Our evaluation highlights key areas where the Falcon3 family of models excels, with a focus on improving performance in scientific domains, reasoning, and general knowledge abilities.

Mathematics proficiency: Falcon3-10B-Base achieves 22.9 in MATH-Lvl5 and 83.0 in GSM8K, demonstrating enhanced reasoning on complex math-focused tasks. Coding proficiency: Falcon3-10B-Base achieved a score of 73.8 on MBPP and Falcon3-10B-Instruct achieved a score of 45.8 on Multipl-E, reflecting its ability to generalize across programming-related tasks. Extended context length: The Falcon3 family of models supports up to 32,000 tokens (except 1B, which supports up to 8,000 contexts) and has improved features, including a score of 86.3 on BFCL (Falcon3-10B-Instruct). It’s improving. Improved inference: Falcon3-7B-Base and Falcon3-10B-Base achieved BBH of 51.0 and 59.7, reflecting enhanced inference capabilities, with the 10B model delivering improved inference performance over 7B. Expand your scientific knowledge: MMLU benchmark performance is 67.4/39.2 (MMLU/MMLU-PRO) for Falcon3-7B-Base and 73.1/42.5 (MMLU/MMLU-PRO) for Falcon3-10B-Base. , demonstrating progress in expertise. each.

Model specifications and benchmark results

The detailed specifications of the Falcon3 family models are summarized in the table below. The Falcon3-7B-Base architecture features a 256 head dimension and is optimized for this dimension, resulting in high throughput when using FlashAttendant-3. These decoder-specific models range from 18 to 40 layers for the transformer-based version and 64 layers for the Mamba model, all models share SwiGLU activation functionality, and have a vocabulary size of 131K tokens (65K for Mamba-7B) . Falcon3-7B-Base is trained on the largest amount of data for comprehensive coverage of concepts and knowledge, while other variants require much less data.

The table below shows the performance of Falcon3-7B-Base and Falcon3-10B-Base on key benchmarks and shows competitive performance in the areas of General, Mathematics, Reasoning, and Common Sense Understanding. Feel free to look at the cards for models that provide additional evaluation results (e.g. MT-Bench, Alpaca, etc.).

training efficiency

The instruction model also exhibits competitively superior performance compared to comparable smaller models, as shown in the table below.

instruct the model

Falcon3-1B-Instruct and Falcon3-3B-Instruct deliver robust performance across the evaluated benchmarks. Specifically, Falcon3-1B achieved competitive results on IFEval (54.4), MUSR (40.7), and SciQ (86.8), and Falcon3-3B achieved further gains especially on MMLU-PRO (29.7) and MATH (19.9). It shows improvement and shows clear scaling. effect. Although it does not outperform competing models on all metrics, the Falcon model shows superior performance in reasoning and common sense understanding compared to both Qwen and Llama. The internal evaluation pipeline looks like this:

lm – use evaluation harness. Reports the raw score obtained by applying the chat template without using fourshot_as_multiturn (unlike Llama3.1). Use the same batch size for all models.

training efficiency

Additionally, Falcon3-7B and Falcon3-10B exhibit robust performance across the evaluated benchmarks. While the Falcon3-7B achieved competitive scores in Reasoning (Arc Challenge: 65.9, MUSR: 46.4) and Mathematics (GSM8K: 79.1), the Falcon3-10B achieved further gains especially in GSM8K (83.1) and IFEval (78). showing improvement and clear scaling benefits. .

training efficiency

Open source initiatives

In line with our mission to advance AI accessibility and collaboration, all models in the Falcon3 family are released under the Falcon LLM license. We hope that the AI ​​community finds these models valuable for research, application development, and further experimentation. Falcon3 is not a culmination, but a continuation of our efforts to create a more capable, efficient, and specialized foundation model. In January 2025, we plan to release further models in the Falcon3 family with enhanced multimodal capabilities including image, video and audio support, as well as a full technical report covering our methodology. We welcome feedback and collaboration from the community as we continue to improve and advance these technologies.

Useful links

Acknowledgment

We would like to sincerely thank the following people for their smooth support and integration within the ecosystem:

quotation

If the Falcon3 family of models was useful for your work, please feel free to quote.

@misc{Falcon3, Title = {Falcon 3 Open Model Family}, URL = {https://huggingface.co/blog/falcon3}, Author = {Falcon-LLM Team}, Month = {December}, Year = { 2024} }

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleWhen it comes to AI, the right approach is key | Waukesha County Explained
Next Article Apple spreads fake news – Research Snipers

Related Posts

Tools

Gemma 3N is fully available in the open source ecosystem!

June 27, 2025
Tools

Major AI Chatbot Parrot CCP Propaganda

June 26, 2025
Tools

Introducing Chatbot Guardrails Arena

June 26, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

How to build an MCP server with Gradio

April 30, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

How to build an MCP server with Gradio

April 30, 20251 Views
Don't Miss

Gemma 3N is fully available in the open source ecosystem!

June 27, 2025

Professor UAB builds user-friendly tools to find hidden AI security threats

June 26, 2025

Major AI Chatbot Parrot CCP Propaganda

June 26, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?