Start building with Gemini 2.5 Flash

Today, we are publishing an early version of Gemini 2.5 Flash in preview via the Gemini API via Google AI Studio and Vertex AI. This new version builds on the popular 2.0 Flash foundation with significant upgrades to inference capabilities while still prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid inference model that gives developers the ability to turn thinking on and off. This model also allows developers to set a thinking budget to find the right trade-offs between quality, cost, and delay. Without thinking, developers can maintain the high speed of 2.0 Flash and improve performance.

Our Gemini 2.5 model is a thinking model, allowing you to reason through your thoughts before responding. Instead of producing output immediately, models can perform “thinking” processes to better understand prompts, break down complex tasks, and plan responses. For complex tasks that require multiple reasoning steps (such as solving a math problem or analyzing a research question), the thought process allows the model to arrive at a more accurate and comprehensive answer. In fact, Gemini 2.5 Flash performs second only to 2.5 Pro at hard prompts in LMArena.

2.5 Flash has metrics comparable to other leading models at a fraction of the cost and size.

The most cost-effective thinking model

2.5 Flash continues to lead as the model with the best price/performance ratio.

Chart showing Gemini 2.5 flash price and performance comparison

Gemini 2.5 Flash adds a new model to Google’s Pareto cost and quality frontier. *

Granular controls to manage your thoughts

We know that different use cases require different quality, cost, and latency tradeoffs. To give developers flexibility, we have enabled the setting of a thought budget that provides fine-grained control over the maximum number of tokens that a model can generate during a thought. As the budget increases, the model can make further inferences to improve quality. Importantly, though, the budget sets an upper bound on how much 2.5 Flash can think about, but the model won’t use the entire budget unless prompted.

The plot graph shows that the quality of inference improves as the thinking budget increases

The quality of reasoning improves as the thinking budget increases.

The model is trained to know how long to think for a given prompt, so it automatically decides how long to think based on the perceived complexity of the task.

2.0 If you want better performance than Flash while minimizing cost and latency, set the thought budget to 0. You can also choose to set a specific token budget for the think phase using parameters or sliders in the Google AI Studio and Vertex AI APIs. For 2.5 Flash, the budget range is 0 to 24576 tokens.

The following prompt shows how much inference is used in 2.5 Flash’s default mode.

Prompts that don’t require much reasoning:

Example 1: “Thank you” in Spanish

Example 2: How many provinces are there in Canada?

Prompts requiring moderate reasoning:

Example 1: Roll two dice. What is the probability that they add up to 7?

Example 2: My gym has basketball pickup hours on MWF from 9:00 PM to 3:00 PM and on Tuesdays and Saturdays from 2:00 PM to 8:00 PM. If you work 5 days a week from 9pm to 6pm and want to play basketball for 5 hours during the week, create a schedule that will make everything work.

Prompts that require advanced reasoning:

Example 1: A cantilever beam of length L=3m has a rectangular cross section (width b=0.1m, height h=0.2m) and is made of steel (E=200 GPa). A uniformly distributed load w=5 kN/m is applied along the entire length and a point load P=10 kN is applied at the free end. Calculate the maximum bending stress (σ_max).

Example 2: Create a function Evaluate_cells(cells: Dict(str, str)) -> Dict(str, float) that calculates the values of cells in a spreadsheet.

Each cell contains:

Or a formula like “=A1 + B1 * 2” using cells like +, -, *, /, etc.

Requirements:

Resolve dependencies between cells. Handles operator precedence (*/ before +-). Detects a cycle and raises ValueError(“Cycle detected at “). There is no eval(). Use only built-in libraries.

Start building with Gemini 2.5 Flash today

Gemini 2.5 Flash with thinking capabilities is now available in preview via the Gemini API in Google AI Studio and Vertex AI, and in a dedicated dropdown in the Gemini app. We encourage you to experiment with the thinking_budget parameter and explore how controllable reasoning can help you solve more complex problems.

from google import genai client = genai.Client(api_key=”GEMINI_API_KEY”) response = client.models.generate_content( model=”gemini-2.5-flash-preview-04-17″, content=”You roll two dice. What is the probability that they add up to 7?”, config=genai.types.GenerateContentConfig( Thinking_config=genai.types.ThinkingConfig( Thinking_budget=1024 ) ) print(response.text)

python

Find a detailed API reference and thought guide in our developer documentation, or get started with code examples in the Gemini Cookbook.

We will continue to improve Gemini 2.5 Flash, with more features coming soon before we begin general availability in a full production environment.

*Model prices are taken from Artificial Analysis and company documentation.

versatileai

See Full Bio

What's Hot

NHS AI blood test could reduce invasive uterine cancer testing

How to shrink your token budget without downsizing your team

Native-speed vLLM Transformer Modeling Backend

NHS AI blood test could reduce invasive uterine cancer testing

How to shrink your token budget without downsizing your team

Native-speed vLLM Transformer Modeling Backend

Physical AI Conference Held in San Jose as Robotics and Autonomous AI Go Mainstream

OpenAI Frontier collides enterprise AI agents with SaaS

Most Popular

Physical AI Conference Held in San Jose as Robotics and Autonomous AI Go Mainstream

OpenAI Frontier collides enterprise AI agents with SaaS

Don't Miss

NHS AI blood test could reduce invasive uterine cancer testing

How to shrink your token budget without downsizing your team

Native-speed vLLM Transformer Modeling Backend

Subscribe to Updates

What's Hot

Start building with Gemini 2.5 Flash

The most cost-effective thinking model

Granular controls to manage your thoughts

Prompts that don’t require much reasoning:

Prompts requiring moderate reasoning:

Prompts that require advanced reasoning:

Start building with Gemini 2.5 Flash today

Related Posts