Gemini 2.5: A series of thought model updates

Today, we are excited to share the latest information on the entire Gemini 2.5 model family.

Gemini 2.5 Pro is generally available and stable (no changes from 06-05 preview) Gemini 2.5 Flash is generally available and stable (no changes from 05-20 preview, see pricing update below) Gemini 2.5 Flash-Lite is now available in preview

The Gemini 2.5 model is a thinking model and can reason through thought before responding, resulting in better performance and increased accuracy. Each model can control its thought budget, allowing developers to choose when and how much the model “thinks” before producing a response.

Gemini 2.5 Overview of the family of thinking models

Introducing Gemini 2.5 Flash-Lite

Today we’re introducing a preview of 2.5 Flash-Lite, the lowest-latency, lowest-cost product in the 2.5 model family. It is designed as a cost-effective upgrade from previous 1.5 and 2.0 flash models. It also improves performance for most evaluations, reducing time to first token and increasing tokens per decode per second. This model is ideal for high-throughput tasks such as large-scale classification and summarization.

Gemini 2.5 Flash-Lite is an inference model that allows you to dynamically control your thought budget using API parameters. Flash-Lite is optimized for cost and speed, so unlike other models, “thinking” is turned off by default. 2.5 In addition to function calls, Flash-Lite also supports all native tools such as Google search grounding, code execution, and URL context.

Gemini 2.5 Flash-Lite benchmark

Gemini 2.5 Flash updates and pricing

Over the past year, our research team has continued to explore the Pareto frontier with our Flash model series. When 2.5 Flash was first announced, we had not yet finalized the features of 2.5 Flash-Lite. We also launched the product with a “possible price” and an “unthinkable price,” which caused confusion among developers.

With the stable rollout of Gemini 2.5 Flash (this is a preview of the same 05-20 model we made available at Google I/O) and the incredible performance of 2.5 Flash, we’re updating the pricing for 2.5 Flash.

$0.30 / 1 million input tokens (*up from $0.15 input) $2.50 / 1 million output tokens (*down from $3.50 output) Eliminated the thinking and non-thinking price difference Maintained a single price range regardless of the size of the input tokens

While we strive to maintain consistent pricing between preview and stable releases to minimize disruption, this is a special adjustment that reflects the outstanding value of Flash and still provides the best cost per intelligence available.

And with Gemini 2.5 Flash-Lite, we now have an even lower-cost option (think it or not) for cost- and latency-sensitive use cases that require less model intelligence.

Gemini Flash family pricing updates

If you are using Gemini 2.5 Flash Preview 04-17, existing preview pricing will remain in effect until deprecation scheduled for July 15, 2025, at which point model endpoints will be turned off. You can move to the general availability model “gemini-2.5-flash” or switch to 2.5 Flash-Lite Preview for a lower cost option.

Continued growth of Gemini 2.5 Pro

Growth and demand for the Gemini 2.5 Pro continues to be the fastest of any model we’ve seen to date. To enable more customers to build this model in production environments, we are stabilizing the 06-05 version of the model at the same Pareto frontier price point as before.

We think Pro shines in cases where you need the most intelligence and the most functionality, such as coding and agent tasks. Gemini 2.5 Pro is at the heart of many of our most beloved developer tools.

Top developer tools with Gemini 2.5 Pro with Cursor, Bolt, Cline, Cognition, Windsurf, GitHub, Lovable, Replit, Zed Industries

Top developer tools with Gemini 2.5 Pro

If you are using 2.5 Pro Preview 05-06, the model will be available until June 19, 2025, after which it will be turned off. If you are using 2.5 Pro Preview 06-05, simply update the model string to “gemini-2.5-pro”.

We can’t wait to see more domains benefit from the intelligence of 2.5 Pro and look forward to sharing more about scaling beyond Pro in the near future.

versatileai

See Full Bio

What's Hot

SenseTime’s Galaxy project aims to scale up domestic AI chips

3.6 Flash, 3.5 Flash Lite, and 3.5 Flash Cyber

Google’s Gemini 3.6 Flash targets enterprise agent token costs

SenseTime’s Galaxy project aims to scale up domestic AI chips

3.6 Flash, 3.5 Flash Lite, and 3.5 Flash Cyber

Google’s Gemini 3.6 Flash targets enterprise agent token costs

Trends and insights with new multilingual and long-form tracks

How AlphaChip revolutionized computer chip design

Tweak video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffuser

Most Popular

Trends and insights with new multilingual and long-form tracks

How AlphaChip revolutionized computer chip design

Tweak video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffuser

Don't Miss

SenseTime’s Galaxy project aims to scale up domestic AI chips

3.6 Flash, 3.5 Flash Lite, and 3.5 Flash Cyber

Google’s Gemini 3.6 Flash targets enterprise agent token costs

Subscribe to Updates

What's Hot

Gemini 2.5: A series of thought model updates

Introducing Gemini 2.5 Flash-Lite

Gemini 2.5 Flash updates and pricing

Continued growth of Gemini 2.5 Pro

Related Posts