Today we are deploying an early version of Gemini 2.5 Flash in preview via Google AI Studio and Vertex AI via Gemini API. Built on the general foundation of 2.0 flash, this new version offers major upgrades to inference functionality while prioritizing speed and cost. Gemini 2.5 Flash is the first complete hybrid inference model, providing developers with the ability to turn thoughts on or off. This model also allows developers to set their thinking budgets and find the right trade-off between quality, cost, and delays. Even if you think about it, developers can maintain high speeds of 2.0 flashes and improve performance.
Our Gemini 2.5 model thinks of models that can be inferred through their ideas before responding. Instead of generating output immediately, the model can better understand the prompts, break down complex tasks, and perform a “thinking” process to plan the response. For complex tasks that require multiple steps of reasoning (such as solving mathematical problems or analyzing research questions), the thought process allows the model to arrive at a more accurate and comprehensive answer. In fact, the Gemini 2.5 Flash works strongly with Lmarena’s hard prompt, making it the second 2.5 Pro.
The 2.5 flash has metrics comparable to other major models for just a few of the cost and size.
Our most cost-effective thinking model
The 2.5 Flash continues to lead the model with the highest price-to-performance ratio.
Gemini 2.5 Flash adds another model to Google’s Pareto Frontier to quality.
Fine-grained control to manage your thoughts
Different use cases reveal different quality, cost and incubation period. To provide developers with flexibility, we were able to set a thinking budget that gives us granular control over the maximum number of tokens that a model can generate during our thinking. Higher budgets will help your model improve even more. Importantly, the budget sets the upper limit for how much a 2.5 flash can think of, but if the prompt does not require that, the model will not use the full budget.
Improved inference quality as budgets increase.
This model is trained to know how much time to think for a particular prompt, so it automatically determines how much to think based on the complexity of the perceived task.
If you want to maintain lowest costs and latency while improving performance beyond 2.0 flash, set your thinking budget to 0. You can also use the API or Google AI Studio slider and Vertex AI parameters to set a specific token budget for the thinking phase. Budgets range from 0 to 24576 tokens on 2.5 flash.
The following prompts show why it is used in the default mode of 2.5 Flash:
Prompts that require low inference:
Example 1: “Thank you” in Spanish
Example 2: How many provinces are there in Canada?
Prompts that require moderate inference:
Example 1: Roll two dice. What is the probability they add to 7?
Example 2: My gym has basketball pickup times at MWF between 9pm and 3pm until 2pm to 8pm on Tuesdays and Saturdays. If you want to work five days a week from 9am to 6am and play five hours of basketball on weekdays, create a schedule to build all the features.
Prompts that require high reasoning:
Example 1: A cantilever beam with length l = 3m has a rectangular cross section (width b = 0.1m, height h = 0.2m) and is made of steel (E = 200 gpa). Exposed to a uniformly distributed load w = 5 kN/m along the entire length and a point load p = 10 kN at the free end. Calculate the maximum bending stress (σ_max).
Example 2: Write function evaluate_cells(cells:dict(str,str)) -> dict(str,float) to calculate the value of a spreadsheet cell.
Each cell has the following:
Or an expression like “=a1 + b1 * 2” using +, -, *, /, and other cells.
Requirements:
Resolves dependencies between the Cells.handle operator priorities (*/ before + – ). Detects cycle and ValueError(“Cycle detected”).no eval(). Uses built-in libraries only.
Starting today to build with Gemini 2.5 Flash
Gemini 2.5 Flash with Thinking feature is now available through the Gemini APIs of Google AI Studio and Vertex AI, and is now available in a dedicated dropdown in the Gemini app. I recommend experimenting with the Thinking_Budget parameter and looking into how controllable inferences can help solve more complex problems.
client = genai.client(API_KEY=“gemini_api_key”))
response = client.Model.Generate_Content(
Model=“gemini-2.5-flash-preview-04-17”,
content=“You roll two dices. What is the probability that they add to 7?”,
config=genai.kinds.GenerateContentConfig(
Thinking_config=genai.kinds.ThinkingConfig(
Thinking_budget=1024
))
))
))
printing(response.Text))
Find detailed API references and thought guides in the developer documentation, and start with Gemini Cookbook code examples.
I’ll come more soon before continuing to improve Gemini 2.5 Flash and make it fully available in general.
*Model pricing is provided from artificial analysis and company documentation