Today we look forward to sharing the full update with the Gemini 2.5 model family.
Gemini 2.5 Pro is generally available and stable (no change from 06-05 preview) Gemini 2.5 flash is generally available and stable (no change from 05-20 preview, see pricing update below) Gemini 2.5 flashlight is now available in preview
Gemini 2.5 models come up with models that can be inferred through thought before reacting, which improves performance and improves accuracy. Each model has control over the thinking budget, which gives developers the ability to choose when and how many models the model is “thinking” before generating a response.
A family overview of Gemini 2.5 thinking model
Introducing Gemini 2.5 Flash-Lite
Today, the 2.5 model family introduces 2.5 flashlights in preview with lowest latency and cost. It is designed as a cost-effective upgrade from previous 1.5 and 2.0 flash models. It also offers better performance on most Evals, achieving tokens per second, providing lower time until the first token. This model is ideal for high-throughput tasks such as large-scale classification and summary.
Gemini 2.5 flashlight is an inference model that allows you to dynamically control your thinking budget using API parameters. Unlike other models, “Thinking” is turned off by default because Flash-Lite is optimized for cost and speed. 2.5 Flash-Lite supports all native tools such as Google search, code execution, and URL context, in addition to feature calls.
Gemini 2.5 flashlight benchmark
Gemini 2.5 Flash and Price Update
Last year, our research team has been pushing Pareto Frontier with its Flash Model Series. When the 2.5 flash was first announced, the functionality of the 2.5 flashlight had not yet been confirmed. It also launched with “thinking” and “non-thinking prices,” leading to confusion among developers.
With a stable version of Gemini 2.5 Flash (preview of the same 05-20 model as the 05-20 model preview available on Google I/O) and incredible performance of 2.5 Flash, we’ve updated the pricing for 2.5 Flash.
$0.30/1M Input Token (*0.15 increments from $0.15 input) $2.50/1M OUTPUT Token (*3.50 decreasing from $3.50 output).
We strive to maintain consistent pricing between previews and stable releases to minimize disruption, but this is a specific adjustment that reflects the exceptional value of Flash, providing the best available costs.
Also, with Gemini 2.5 Flash-Lite, there are even lower cost options (with or without thinking) for less model intelligence and latency sensitive use cases.
Gemini Flash Family Price Update
If you are using Gemini 2.5 Flash Preview 04-17, existing preview pricing will take effect until the deprecation planned for July 15th, 2025. You can move to the commonly available model “Gemini-2.5-Flash” or switch to the 2.5 Flash-Lite preview as a low-cost option.
Continuous growth of Gemini 2.5 Pro
The growth and demand for the Gemini 2.5 Pro continues to be the steepest model you’ve ever seen. To allow more customers to build on this model in production, we have stabilized the 06-05 version of the model at the same Pareto Frontier price range as before.
If you need the best intelligence and most abilities, you can expect to see professional glow, such as coding and agent tasks. The Gemini 2.5 Pro is at the heart of many of the most beloved developer tools.
Top Developer Tools Using Gemini 2.5 Pro
If you are using the 2.5 Pro Preview 05-06, the model will remain available until June 19, 2025 and will be turned off after that. If you are using 2.5 Pro Preview 06-05, simply update the model string to “Gemini-2.5-Pro”.
I can’t wait to see more domains benefit from 2.5 Pro’s intelligence. We look forward to sharing more about scaling beyond Pro in the near future.