Introducing the Gemini 2.5 computer usage model

Earlier this year, he said it was providing computer usage capabilities to developers through the Gemini API. Today we are releasing a Gemini 2.5 computer-use model. This enhances the agent that can interact with the user interface (UIS) with a new specialized model built on the visual understanding and inference capabilities of Gemini 2.5 Pro. It is better than the major alternatives on multiple web and mobile control benchmarks, all with delays. Developers can access these features through Google AI Studio and Vertex AI’s Gemini API.

AI models can interface with software via structured APIs, but many digital tasks require direct interaction with the graphical user interface, such as filling and submitting forms. To complete these tasks, agents must navigate web pages and applications, as humans do. By clicking, typing, scrolling. Fill in the form natively, manipulate interactive elements such as dropdowns and filters, and the functionality behind login is an important next step in building a powerful, generic agent.

How it works

The core functionality of the model is exposed through the new `Computer_use` tool in the Gemini API and must be manipulated within a loop. Inputs to the tool are user requests, screenshots of the environment, and history of recent actions. The input can also specify whether to exclude functions from the complete list of supported UI actions or to specify additional custom functions to include.

versatileai

See Full Bio

What's Hot

Storage bucket now available on Hug Face Hub

The latest Gemini model with the power to think

Automate complex financial workflows with multimodal AI

Storage bucket now available on Hug Face Hub

The latest Gemini model with the power to think

Automate complex financial workflows with multimodal AI

StarCoder2 and Stack V2

Tiktok floods Google VEO3 racist AI videos

We had Claude fine-tune our open source LLM

Most Popular

StarCoder2 and Stack V2

Tiktok floods Google VEO3 racist AI videos

We had Claude fine-tune our open source LLM

Don't Miss

Storage bucket now available on Hug Face Hub

The latest Gemini model with the power to think

Automate complex financial workflows with multimodal AI

Subscribe to Updates

What's Hot

Introducing the Gemini 2.5 computer usage model

How it works

Related Posts