Introduction to Gemini 2.5 Computer Usage Model

Earlier this year, we said we would provide computer usage capabilities to developers through the Gemini API. Today we are releasing the Gemini 2.5 computer usage model. This is a new specialized model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that power agents that can interact with the user interface (UI). Outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency. Developers can access these capabilities through Google AI Studio and Vertex AI’s Gemini API.

Although AI models can interact with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, such as filling out and submitting forms. To complete these tasks, agents must interact with web pages and applications like humans by clicking, typing, and scrolling. The ability to natively fill out forms, interact with interactive elements like dropdowns and filters, and operate behind a login is an important next step in building powerful general-purpose agents.

structure

The core functionality of the model is exposed through the new `computer_use` tool in the Gemini API and must be manipulated within a loop. Inputs to the tool are user requests, screenshots of the environment, and a history of recent actions. In the input, you can also specify whether to exclude the function from the full list of supported UI actions or specify additional custom functions to include.

versatileai

See Full Bio

What's Hot

Any custom frontend with Gradio’s backend

Start building with Gemini 2.0 Flash and Flash-Lite

Governance becomes a priority as AI agents take on more tasks

Any custom frontend with Gradio’s backend

Start building with Gemini 2.0 Flash and Flash-Lite

Governance becomes a priority as AI agents take on more tasks

Faster Text Generation with Self-Speculative Decoding

Build a great dataset for video generation

AI was introduced in depository and participant regulations

Most Popular

Faster Text Generation with Self-Speculative Decoding

Build a great dataset for video generation

AI was introduced in depository and participant regulations

Don't Miss

Any custom frontend with Gradio’s backend

Start building with Gemini 2.0 Flash and Flash-Lite

Governance becomes a priority as AI agents take on more tasks

Subscribe to Updates

What's Hot

Introduction to Gemini 2.5 Computer Usage Model

structure

Related Posts