Today we are releasing the stable version of Gemini 2.5 Flash-Lite, the fastest and lowest cost ($0.10/M input, $0.40/M output) model in the Gemini 2.5 model family. We built 2.5 Flash-Lite to push the frontiers of intelligence per dollar with native inference capabilities that can be optionally toggled for more demanding use cases. Building on the momentum of 2.5 Pro and 2.5 Flash, this model completes the set of 2.5 models ready for use in large-scale production environments.
Most cost-effective and fastest 2.5 model ever
Gemini 2.5 Flash-Lite balances performance and cost without compromising quality, especially for delay-sensitive tasks such as translation and classification.
The characteristics are as follows.
Best-in-class speed: Gemini 2.5 Flash-Lite has lower latency than both 2.0 Flash-Lite and 2.0 Flash across a wide range of prompt samples. Cost Efficiency: This is the lowest cost 2.5 model to date, priced at $0.10/million input tokens and $0.40 output tokens, allowing you to handle large numbers of requests affordably. We’ve also reduced the price of audio inputs by 40% since previews began. Smart and small: Demonstrates overall higher quality than 2.0 Flash-Lite across a wide range of benchmarks including coding, math, science, reasoning, and multimodal understanding. Full Features: Build with 2.5 Flash-Lite and get access to a 1 million token context window, a controllable thought budget, and support for native tools like Grounding with Google Search, Code, and more. Execution and URL context.
Gemini 2.5 Flash-Lite in action
Since the release of 2.5 Flash-Lite, we have already seen some incredibly successful deployments. Here are some of our favorites.
Satlyt is transforming the way satellite data is processed, building a distributed space computing platform for real-time summarization of on-orbit telemetry, autonomous task management, and intersatellite communications analysis. 2.5 Flash-Lite speeds reduced critical onboard diagnostic delays by 45% and reduced power consumption by 30% compared to the baseline model. HeyGen uses AI to create avatars for video content, leverages Gemini 2.5 Flash-Lite to automate video planning, analyze and optimize content, and translate videos into over 180 languages. This allows us to provide a global and personalized experience to our users. DocsHound turns product demos into documentation by using Gemini 2.5 Flash-Lite to process long videos and extract thousands of screenshots with low latency. This allows you to transform footage into comprehensive documentation and training data for AI agents much faster than traditional methods. Evertune helps you understand how your brand is represented across AI models. Gemini 2.5 Flash-Lite is their revolutionary product that dramatically speeds up analysis and report generation. Its fast performance allows you to quickly scan and synthesize large amounts of model output to provide dynamic and timely insights to your clients.
You can start using 2.5 Flash-Lite by specifying “gemini-2.5-flash-lite” in your code. If you are using the preview version, you can switch to “gemini-2.5-flash-lite” which is the same underlying model. Flash-Lite preview aliases will be removed on August 25th.
Ready to start building? Try the stable version of Gemini 2.5 Flash-Lite with Google AI Studio and Vertex AI today.

