Today we are releasing a stable version of the Gemini 2.5 Flash-Lite, the fastest and lowest-cost model (input of 0.10 $0.10 per m, output of 0.40 per m) of the Gemini 2.5 model family. We built a 2.5 flashlight to push the frontier of intelligence per dollar. It has native inference features that can be optionally toggled for more demanding use cases. Based on the momentum of the 2.5 Pro and 2.5 Flash, this model concludes the set of 2.5 models ready for scaled production use.
Our most cost-effective and fastest 2.5 model is still
Gemini 2.5 Flash-Lite balances performance and cost without compromising quality, especially for delay-sensitive tasks such as translation and classification.
This is what stands out:
Best in Class-in Class Speed: Gemini 2.5 Flashlight has lower latency than both 2.0 Flashlight and 2.0 Flashlight with a wide range of samples of prompts. Cost-Efficiency: This allows you to handle affordable, large amounts of requests at the price of $0.10/1M input token and 0.40 $0.40 output token. We’ve also reduced audio input pricing by 40% from launching preview. Smart. URL context.
Gemini 2.5 Flash-lite in Action
Since the launch of 2.5 Flash-Lite, we have already seen some very successful developments, here are some of our favorites:
Satlyt is building a distributed space computing platform for transforming the way satellite data is processed and real-time summary of intraordinate telemetry, autonomous task management, and intersatellite communication analysis. 2.5 Flash-Lite speed reduced latency for critical onboard diagnostics by 45%, and reduced consumption consumption by 30% compared to the baseline model. Heygen uses AI to create avatars for video content, leverage Gemini 2.5 Flash-Lite to automate video planning, analyze content and convert it to over 180 languages. This allows users to provide a global, personalized experience. DocShound uses Gemini 2.5 Flash-Lite to process long videos and extract thousands of screenshots with low latency, turning product demos into documentation. This converts footage into comprehensive documentation and training data from AI agents much faster than traditional methods. EverTune helps you understand how a brand is represented across the AI model. Gemini 2.5 Flash-Lite is a game changer for them, dramatically speeding up analysis and report generation. Its fast performance allows you to quickly scan and synthesize large quantities of model outputs to provide dynamic, timely insights to your clients.
You can start using 2.5 Flash-Lite by specifying “Gemini-2.5-Flash-Lite” in your code. If you are using the preview version, you can switch to the same basic model, “Gemini-2.5-Flash-Lite.” We plan to remove the preview alias for Flash-Lite on August 25th.
Ready to start the building? Try a stable version of Gemini 2.5 Flash-Lite with Google AI Studio and Vertex AI.