Introduction to Gemini 2.5 Computer Usage Model

Earlier this year, we said we would provide computer usage capabilities to developers through the Gemini API. Today we are releasing the Gemini 2.5 computer usage model. This is a new specialized model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that power agents that can interact with the user interface (UI). Outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency. Developers can access these capabilities through Google AI Studio and Vertex AI’s Gemini API.

Although AI models can interact with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, such as filling out and submitting forms. To complete these tasks, agents must interact with web pages and applications like humans by clicking, typing, and scrolling. The ability to natively fill out forms, interact with interactive elements like dropdowns and filters, and operate behind a login is an important next step in building powerful general-purpose agents.

structure

The core functionality of the model is exposed through the new `computer_use` tool in the Gemini API and must be manipulated within a loop. Inputs to the tool are user requests, screenshots of the environment, and a history of recent actions. In the input, you can also specify whether to exclude the function from the full list of supported UI actions or specify additional custom functions to include.

versatileai

See Full Bio

What's Hot

Introduction to Gemini 2.5 Computer Usage Model

Trial enterprise AI agents for Intuit, Uber, and State Farm

I’m done trusting black box leaderboards for the community.

Trial enterprise AI agents for Intuit, Uber, and State Farm

I’m done trusting black box leaderboards for the community.

Google Gemini’s image editing gets a major upgrade

The future of PR is about automated workflows, not faster content creation – Unite.AI

South Carolina lawmakers reject proposal to block new state law regulating AI

PepsiCo uses AI to rethink how factories are designed and updated

Most Popular

The future of PR is about automated workflows, not faster content creation – Unite.AI

South Carolina lawmakers reject proposal to block new state law regulating AI

PepsiCo uses AI to rethink how factories are designed and updated

Don't Miss

Introduction to Gemini 2.5 Computer Usage Model

Trial enterprise AI agents for Intuit, Uber, and State Farm

I’m done trusting black box leaderboards for the community.

Subscribe to Updates

What's Hot

Introduction to Gemini 2.5 Computer Usage Model

structure

Related Posts