Gemini 2.5: Updated Thinking Model Family

Today we look forward to sharing the full update with the Gemini 2.5 model family.

Gemini 2.5 Pro is generally available and stable (no change from 06-05 preview) Gemini 2.5 flash is generally available and stable (no change from 05-20 preview, see pricing update below) Gemini 2.5 flashlight is now available in preview

Gemini 2.5 models come up with models that can be inferred through thought before reacting, which improves performance and improves accuracy. Each model has control over the thinking budget, which gives developers the ability to choose when and how many models the model is “thinking” before generating a response.

A family overview of Gemini 2.5 thinking model

Introducing Gemini 2.5 Flash-Lite

Today, the 2.5 model family introduces 2.5 flashlights in preview with lowest latency and cost. It is designed as a cost-effective upgrade from previous 1.5 and 2.0 flash models. It also offers better performance on most Evals, achieving tokens per second, providing lower time until the first token. This model is ideal for high-throughput tasks such as large-scale classification and summary.

Gemini 2.5 flashlight is an inference model that allows you to dynamically control your thinking budget using API parameters. Unlike other models, “Thinking” is turned off by default because Flash-Lite is optimized for cost and speed. 2.5 Flash-Lite supports all native tools such as Google search, code execution, and URL context, in addition to feature calls.

Gemini 2.5 flashlight benchmark

Gemini 2.5 Flash and Price Update

Last year, our research team has been pushing Pareto Frontier with its Flash Model Series. When the 2.5 flash was first announced, the functionality of the 2.5 flashlight had not yet been confirmed. It also launched with “thinking” and “non-thinking prices,” leading to confusion among developers.

With a stable version of Gemini 2.5 Flash (preview of the same 05-20 model as the 05-20 model preview available on Google I/O) and incredible performance of 2.5 Flash, we’ve updated the pricing for 2.5 Flash.

$0.30/1M Input Token (*0.15 increments from $0.15 input) $2.50/1M OUTPUT Token (*3.50 decreasing from $3.50 output).

We strive to maintain consistent pricing between previews and stable releases to minimize disruption, but this is a specific adjustment that reflects the exceptional value of Flash, providing the best available costs.

Also, with Gemini 2.5 Flash-Lite, there are even lower cost options (with or without thinking) for less model intelligence and latency sensitive use cases.

Gemini Flash Family Price Update

If you are using Gemini 2.5 Flash Preview 04-17, existing preview pricing will take effect until the deprecation planned for July 15th, 2025. You can move to the commonly available model “Gemini-2.5-Flash” or switch to the 2.5 Flash-Lite preview as a low-cost option.

Continuous growth of Gemini 2.5 Pro

The growth and demand for the Gemini 2.5 Pro continues to be the steepest model you’ve ever seen. To allow more customers to build on this model in production, we have stabilized the 06-05 version of the model at the same Pareto Frontier price range as before.

If you need the best intelligence and most abilities, you can expect to see professional glow, such as coding and agent tasks. The Gemini 2.5 Pro is at the heart of many of the most beloved developer tools.

Features top developer tools using Gemini 2.5 Pro: Cursor, Volt, Klein, Cognition, Windsurf, Github, Lovable, Replit, Zed Industries

Top Developer Tools Using Gemini 2.5 Pro

If you are using the 2.5 Pro Preview 05-06, the model will remain available until June 19, 2025 and will be turned off after that. If you are using 2.5 Pro Preview 06-05, simply update the model string to “Gemini-2.5-Pro”.

I can’t wait to see more domains benefit from 2.5 Pro’s intelligence. We look forward to sharing more about scaling beyond Pro in the near future.

versatileai

See Full Bio

What's Hot

From experiment to corporate reality

Identify content created with Google’s AI tools

Inadequate introduction of AI may be the reason behind the reduction in personnel

From experiment to corporate reality

Identify content created with Google’s AI tools

Inadequate introduction of AI may be the reason behind the reduction in personnel

Open Source DeepResearch – Unlocking Search Agents

How to use AI to support better tropical cyclone forecasting — Google DeepMind

CIO’s Governance Guide

Most Popular

Open Source DeepResearch – Unlocking Search Agents

How to use AI to support better tropical cyclone forecasting — Google DeepMind

CIO’s Governance Guide

Don't Miss

From experiment to corporate reality

Identify content created with Google’s AI tools

Inadequate introduction of AI may be the reason behind the reduction in personnel

Subscribe to Updates

What's Hot

Gemini 2.5: Updated Thinking Model Family

Introducing Gemini 2.5 Flash-Lite

Gemini 2.5 Flash and Price Update

Continuous growth of Gemini 2.5 Pro

Related Posts