Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

How robust AI governance protects enterprise interests

April 10, 2026

The need for AI workflows and monitoring for software developers

April 9, 2026

Workplace learning for AI agents

April 8, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Saturday, April 11
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»12B – High throughput computer usage agent
Tools

12B – High throughput computer usage agent

versatileaiBy versatileaiMarch 27, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are pleased to release Holotron-12B, a multimodal computing model from Company H. Post-trained from an open NVIDIA Nemotron-Nano-2 VL model with H Company’s proprietary data mixture, Holotron-12B is the result of close collaboration between our laboratories to design a new type of model that is primarily optimized for scale and performance in production environments.

Company H is part of the NVIDIA Inception program.

This model is now available at Hugging Face.

Most current multimodal models are primarily optimized for static vision or following instructions. However, the Holotron-12B, like the Holo2 model, has a different goal. It is to serve as a policy model for computer-using agents that need to perceive, decide, and act efficiently in interactive environments.

With Holotron-12B, we wanted to create a model that could scale efficiently and effectively in production and maintain good performance in agent benchmarks while handling long contexts with multiple images. The NVIDIA Nemotron model provides a strong foundation on the inference side, and the development of Holotron-12B demonstrated how much the model can accomplish with further training.

High-throughput inference with hybrid SSM architecture

Holotron-12B’s significant leap in inference efficiency is enabled by its underlying Nemotron architecture, which utilizes a hybrid state-space model (SSM) and attention mechanisms. Unlike purely transformer-based models, this design is optimized for high-throughput services. State-space models provide excellent scalability for long context inference by avoiding the secondary computational costs associated with full attention mechanisms, particularly benefiting agent workloads that include multiple images or long interaction histories. From an inference perspective, SSM’s main contribution is a significant reduction in memory footprint. While vanilla attention stores K and V activations per token and layer (the infamous KV cache), SSM is a linear regression model and only stores constant state per layer for each generated sequence, regardless of the length of the sequence.

As evaluated on the WebVoyager benchmark, this model outperforms using a real-world multimodal agent workload that features long contexts, multiple high-resolution images, and high request concurrency with 100 benchmark workers. Running on a single H100 GPU and using vLLM with the latest SSM optimizations (v0.14.1), Holotron-12B achieved over 2x higher throughput compared to Holo2-8B. This makes Holotron-12B an attractive choice for throughput-constrained workloads such as data generation, annotation, and online reinforcement learning.

In a controlled experimental setup (see Figure 2), Holotron-12B continued to scale efficiently as concurrency increased, with total token throughput steadily increasing to 8.9k tokens/sec at a maximum concurrency of 100. In contrast, Holo2-8B’s total token throughput plateaus much faster at 5.1k tokens/sec. This behavior highlights the main strengths of the Nemotron architecture: more effective and efficient VRAM utilization and a smaller overall memory footprint, allowing for larger effective batch sizes on the same hardware. Even with large batch sizes, Holotron-12B maintains strong throughput.

Holotron-12B Training and Evaluation

Holotron-12B was trained in two stages. We started with Nemotron-Nano-12B-v2-VL-BF16, a multimodal base model released by NVIDIA. We then performed supervised fine-tuning to H Company’s proprietary localization and navigation data mix, focusing on screen understanding, grounding, and UI-level interactions.

The last checkpoint was trained with about 14 billion tokens.

Agent benchmark

In computer usage and navigation benchmarks, Holotron-12B shows significant improvements over Nemotron-based models and shows superior performance with established agent models. WebVoyager’s performance increases from 35.1% to 80.5%, outperforming Holo2-8B’s performance in the benchmark, demonstrating the model’s ability to perform effectively in agent settings.

Localization benchmark

Holotron-12B also offers significant improvements over the base Nemotron model when it comes to localization and grounding benchmarks such as OS-World-G, GroundUI, and WebClick.

Holotron-12B shows that the NVIDIA Nemotron VL model, when combined with appropriate training settings and infrastructure work, provides a strong foundation for real-world multimodal agents.

This model provides strong agent performance, significantly increased inference throughput, and a clear path to future improvements, especially regarding high-resolution vision training.

I’m looking forward to seeing what others build with the Holotron-12B. Models and checkpoints are currently available on Hugging Face under the NVIDIA Open Model License.

NVIDIA today announced the release of Nemotron 3 Omni. Building on the success of Holotron-12B, we are preparing to post-train this next generation multimodal model. By leveraging the enhanced hybrid SSM-tention and MoE architectural foundations of the Nemotron 3 family, we aim to deliver further leaps in inference capabilities and multimodal accuracy with the newly announced Nemotron 3 Omni. This evolution will push Holotron beyond research and into commercial applications, providing enterprises with the high-throughput, low-latency performance needed for large-scale autonomous “compute-enabled” deployments.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleGoogle’s latest AI audio model
Next Article Family offices turn to AI for financial data insights
versatileai

Related Posts

Tools

How robust AI governance protects enterprise interests

April 10, 2026
Tools

The need for AI workflows and monitoring for software developers

April 9, 2026
Tools

Workplace learning for AI agents

April 8, 2026
Add A Comment

Comments are closed.

Top Posts

Storage bucket now available on Hug Face Hub

March 30, 20264 Views

The best AI chatbots for customer service in September 2025

September 26, 20254 Views

Prioritized LLMS tuning using a direct priority optimization method

August 5, 20254 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Storage bucket now available on Hug Face Hub

March 30, 20264 Views

The best AI chatbots for customer service in September 2025

September 26, 20254 Views

Prioritized LLMS tuning using a direct priority optimization method

August 5, 20254 Views
Don't Miss

How robust AI governance protects enterprise interests

April 10, 2026

The need for AI workflows and monitoring for software developers

April 9, 2026

Workplace learning for AI agents

April 8, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?