Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

US state attorneys general call for improved AI safety

December 11, 2025

Microsoft Prompts fixes an issue where AI prompts could not be delivered

December 11, 2025

Trump AI executive order raises the possibility of legal conflict with Republican states

December 11, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Thursday, December 11
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Aprilel-1.6-15b-Thinker: Cost-effective frontier multimodal performance
Tools

Aprilel-1.6-15b-Thinker: Cost-effective frontier multimodal performance

versatileaiBy versatileaiDecember 11, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are releasing Aprilel-1.6-15b-Thinker, a 15 billion parameter multimodal inference model from ServiceNow’s Aprilel SLM series. This achieves SOTA performance for models 10 times the size. Apriel-1.6 is built on top of Apriel-1.5-15b-Thinker and focuses on improving text and vision inference while increasing token efficiency. This version was trained on NVIDIA DGX™ Cloud, powered by the GB200 Grace™ Blackwell Superchip.

Aprilel-1.6 received a score of 57 on the Artificial Analysis Index, outperforming models such as Gemini 2.5 Flash, Claude Haiku 4.5, and GPT OSS 20b. Qwen3 235B Achieves the same score as A22B, but with much more efficiency. This new release improves or maintains task performance and reduces inference token usage by over 30% compared to the previous Aprilel-1.5-15B-Thinker (1).

in the middle of training

It follows the same overall training process used in Aprilel-1.5-15B-Thinker. This includes a depth upscaling phase followed by two continuous pre-training (CPT) stages (detailed in (1)). The depth upscaling corpus consists of 35% of data from a variety of sources, including high-quality web content, scientific and technical literature, mathematical problem sets, and programming code. 15% high quality dataset from NVIDIA Nemotron™. The remaining 50% pre-training style data serves as replays.

Aprilel-1.6-15B-Thinker extends the Stage 1 CPT mixture, which focuses on enhancing text inference and image understanding, by adding text-only samples and image-text pairs. The new text data is fully synthesized and covers general reasoning, knowledge, coding, and creative writing, while the multimodal part spans document and graph understanding, OCR, visual reasoning tasks, and SVG/Web code synthesis.

Stage 1 is followed by a text-only CPT run with an expanded 49K sequence length, followed by Stage 2 to further refine the model’s visual inference capabilities. This combination created a strong base model that provides a solid foundation after subsequent training. Training this intermediate training pipeline required approximately 10,000 GPU hours on NVIDIA’s GB200. This is consistent with the goal of high throughput enabling a small compute footprint and building powerful models with limited resources through careful data strategies and training methodologies.

after training

We use the intermediately trained model to perform post-training following a pipeline consisting of large-scale supervised fine-tuning (SFT) and reinforcement learning (RL) targeting both visual and textual abilities.

Supervised fine-tuning (SFT)

Our supervised fine-tuning (SFT) stage focuses on improving the inference quality of Aprilel-1.6 by training it on a carefully curated dataset of 2.4 million high-signal text samples. Each example includes an explicit step-by-step inference trace, allowing the model to internalize a transparent inference process rather than simply reproducing the final answer.

To build this dataset, we combined a wide range of executable synthetic samples for math, coding, and scientific problem solving with guided, conversational, API/function calls, creative writing, safety, and other knowledge-intensive samples. Data quality was treated as a top priority and all samples underwent multi-step deduplication, content filtering, heuristic quality pruning, LLM-as-Judge validation, execution-based validation (where applicable), and rigorous decontamination against evaluation benchmarks.

The SFT was run in two phases, both trained with a context length of 32K. In the first phase, we performed large-scale text-only training on 2.4 million samples over 4 epochs. Compared to Apriel-1.5-15b-Thinker, we simplified the chat template by removing redundant tags and introduced four special tokens (,,,,(BEGIN FINAL RESPONSE),) in the tokenizer to facilitate output parsing.

The second phase was a lightweight multimodal run trained for 3 epochs using rejection-sampled data from Aprilel-1.5-15b-Thinker to ensure that the model maintained strong performance on image inputs after the introduction of these special tokens, while also preparing the downstream RL stages.

This approach provided a robust, high-quality SFT foundation on which the RL pipeline could operate effectively. The resulting model exhibits strong multimodal understanding, improved text inference capabilities, and enhanced agent behavior.

Reinforcement learning (RL)

We employ a multi-stage RL setup that focuses on simultaneously improving inference power and efficiency. Train models in image domains such as visual reasoning, general visual question answering (VQA), and optical character recognition (OCR). The training data also consists of data across a variety of domains, including simple questions (to encourage short, direct answers to simple questions), mathematics (numerical reasoning), STEM (multiple-choice scientific questions), and function calls (to use structured tools).

Rewards are given for correctness of responses, and penalties are given for undesirable behavior such as verbosity or malformation. Overall, our setup is designed to improve the model’s inference ability while using fewer inference tokens, avoiding unnecessary intermediate steps, encouraging it to stop early when confident, and respond more directly to simpler queries.

Training is done on Group Sequence Policy Optimization loss (GSPO) (2) using the VeRL framework and rule-based validation.

evaluation

Text evaluation

We evaluate Aprilel-1.6 in different areas such as tool usage, mathematics, coding, instruction following, and long context.

*This score is when DCA is enabled. Without this, the model would have a score of 36.

** Average score is calculated using all benchmarks except BFCL v3 Only and DeepResearchBench, as some models do not have scores for these two benchmarks.

*** The AA LCR score for o3-mini-high is the predicted score based on the AA index score.

Image evaluation

We evaluate the Aprilel-1.6 model based on a representative set of evaluations that primarily focus on mathematical reasoning, visual question answering, logical reasoning, STEM-related tasks, and chart-based reasoning. All evaluations are performed using VLMEvalkit. Apriel-1.6 improves by 4 points on average across 13 benchmarks in the Image Index, consisting of the following benchmarks: MathVision, MathVista, MMMU (Verification), MMMU-Pro (10 selection COT), MMMU-Pro (Vision only COT), MathVerse (Vision Dominant), MathVerse (Text Dominant), MMStar, BLINK, LogicVista, CharXiV (description), CharXiV (inference), AI2D (testing).

Image index performance

Cost-effective frontier performance

Intelligence vs. Total Parameters (11/30/25)

Aprilel-1.6-15B-Thinker sits in the sweet spot of the cost-effective frontier. It uses just 15 billion parameters while providing intelligence scores that match or exceed much larger models. On the chart, it sits firmly within the most attractive quadrant, with a good balance between efficiency and top-level reasoning. In practice, this means Aprilel-1.6-15B-Thinker delivers strong performance and deep inference at a fraction of the compute and deployment costs of its strongest competitors, making it a highly efficient choice for real-world, especially enterprise applications.

Intelligence and Output Tokens Used in the Artificial Analytics Intelligence Index (11/30/25)

Our post-training focuses on improving the efficiency of inference tokens. The image above showing intelligence scores against token usage highlights post-training effectiveness. Aprilel-1.6-15B-Thinker once again lands in the most attractive quadrant. This model reaches a high Artificial Analytics Intelligence Index score while using far fewer tokens than many similar or larger models. Reduces token usage by more than 30% compared to Aprilel-1.5-15b-Thinker (1).

Overall, Aprilel-1.6 is a very capable reasoner that maintains the memory and efficiency characteristics needed for enterprise deployments.

Acknowledgment

We would like to acknowledge the contributions of: Varun Pandey, Shashank Maiya, Dhruv Jhamb, Massimo Caccia, Dheeraj Vattikonda, Nicolas Gontier, Patrice Bechard, Tayfun Tuna, Kavya Sriram, Denis Akhiyarov, Hari Subramani, and Tara Bogavelli.

Notes and limitations

We are a small laboratory with big goals. While our lab is not GPU-poor, we only have a small fraction of the compute available to other Frontier Labs by comparison. Our goal in this work is to show that with the right data, design, and solid methodology, SOTA models can be built even with limited resources.

We set out to build a small but powerful model with the same capabilities as the Frontier model. Developing a 15B model with this level of performance requires trade-offs, so we prioritized obtaining SOTA-level performance and increasing inference token efficiency.

The model is trained to perform extensive inference for difficult questions and reduce inference effort for simple questions. We are always actively working to make our models more efficient and concise in future releases.

This model has some vision-related limitations to be aware of. Complex or low-quality images can reduce OCR accuracy, dense scenes (such as crowds or large numbers of similar objects) can make details subtler and more difficult to count, and highly detailed or unusually formatted charts can lead to incomplete interpretation. Additionally, fine-grained visual evidence can reduce accuracy, resulting in approximate or inconsistent bounding box predictions.

References

(1) Radhakrishna, S., Tiwari, A., Shukla, A., Hashemi, M., Maheshwary, R., Malay, SKR, Mehta, J., Pattnaik, P., Mittal, S., Slimi, K., Ogueji, K., Oladipo, A., Parikh, S., Bamgbose, O., Liang, T., Masry, A., Mahajan, K., Mudumba, SR, Yadav, V., Madhusudhan, ST, Scholak, T., Davasam, S., Sunkara, S., and Chapados, N., 2025. Aprilel-1.5-15b-Thinker. arXiv preprint arXiv:2510.01141.

(2) Zheng, C., Liu, S., Li, M., Chen, X.-H., Yu, B., Gao, C., Dang, K., Liu, Y., Men, R., Yang, A., Zhou, J. and Lin, J., 2025. Optimizing group sequence policies. arXiv preprint arXiv:2507.18071.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleGemini 3 for developers: new inference, agent features
Next Article Trump AI executive order raises the possibility of legal conflict with Republican states
versatileai

Related Posts

Tools

Microsoft Prompts fixes an issue where AI prompts could not be delivered

December 11, 2025
Tools

Gemini 3 for developers: new inference, agent features

December 10, 2025
Tools

Accenture and Anthropic partner to power enterprise AI integration

December 10, 2025
Add A Comment

Comments are closed.

Top Posts

New image verification feature added to Gemini app

December 7, 20257 Views

Aluminum OS is the AI-powered successor to ChromeOS

December 7, 20255 Views

UK and Germany plan to commercialize quantum supercomputing

December 5, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New image verification feature added to Gemini app

December 7, 20257 Views

Aluminum OS is the AI-powered successor to ChromeOS

December 7, 20255 Views

UK and Germany plan to commercialize quantum supercomputing

December 5, 20255 Views
Don't Miss

US state attorneys general call for improved AI safety

December 11, 2025

Microsoft Prompts fixes an issue where AI prompts could not be delivered

December 11, 2025

Trump AI executive order raises the possibility of legal conflict with Republican states

December 11, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?