Today we’re introducing Gemini 3.1 Flash-Lite, the fastest and most cost-effective model in the Gemini 3 series. 3.1 Flash-Lite is built for large-scale developer workloads and offers high quality at its price and model tier.
Starting today, 3.1 Flash-Lite is available in preview for developers through Google AI Studio’s Gemini API and for enterprises through Vertex AI.
Uncompromising cost efficiency
3.1 Flash-Lite offers enhanced performance at a fraction of the cost of larger models, priced at just $0.25 per million input tokens and $1.50 per million output tokens. Artificial analytics benchmarks show it outperforms 2.5 Flash with 2.5x faster time to first response token and 45% faster output speed while maintaining the same or better quality. This low latency is necessary for high-frequency workflows, making it an ideal model for developers to build responsive real-time experiences.

