Two months after releasing the first batch of Holo2 models, H Company is back with the Holo2-235B-A22B preview, its largest UI localization model to date. This model achieved new state-of-the-art (SOTA) records of 78.5% in Screenspot-Pro and 79.0% in OSWorld G.
The Holo2-235B-A22B preview available on Hugging Face is a research release focused on localization of UI elements.
Localization by agent
High-resolution 4K interfaces are challenging for localization models. Small UI elements can be difficult to spot on large displays. However, with agent localization, Holo2 can iteratively adjust its predictions to improve accuracy at each step, achieving relative gains of 10-20% across all Holo2 model sizes.
Holo2-235B-A22B performance in ScreenSpot-Pro
Holo2-235B-A22B preview reaches 70.6% accuracy in 1 step with ScreenSpot-Pro. In agent mode, we achieved 78.5% within 3 steps, establishing a new state-of-the-art in the most difficult GUI grounding benchmark.

Trained with SkyPilot
Training Holo2 models at scale requires coordinating workloads across multiple cloud providers. Company H uses SkyPilot as a unified interface for launching training jobs on clusters using Kubernetes (k8s). SkyPilot abstracts infrastructure complexity, allowing researchers to focus on model development instead of managing k8s manifests or maintaining separate deployment scripts.

