Today, we are releasing two latest production-ready Gemini models, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, along with:
Prices have been reduced by over 50% in 1.5 Pro (both input and output prompts) Rate limits are 2x higher in 1.5 Flash and up to 3x higher in 1.5 Pro. Output is 2x faster and latency is 3x lower. Default filter settings have been updated.
These new models build on the latest experimental model releases and include meaningful improvements over the Gemini 1.5 models released at Google I/O in May. Developers have free access to the latest models through Google AI Studio and the Gemini API. For larger organizations and Google Cloud customers, models are also available in Vertex AI.
Improved overall quality with much improved math, long context, and vision
The Gemini 1.5 series is a model designed for general performance across a wide range of text, code, and multimodal tasks. For example, you can use the Gemini model to synthesize information from a 1,000-page PDF, answer questions about a repository with more than 10,000 lines of code, or take an hour of video and create useful content from it. You can.
With the latest updates, 1.5 Pro and Flash are now better, faster, and more cost-effective to build in production. MMLU-Pro, a more difficult version of the popular MMLU benchmark, shows increases of up to 7%. The MATH and HiddenMath (internal holdout set of competitive math problems) benchmarks showed significant improvements of up to 20% for both models. For vision and code use cases, both models also improved performance (ranging from ~2 to 7%) across evaluations measuring visual understanding and Python code generation.
We’ve also improved the overall usefulness of model responses while still maintaining our content safety policies and standards. This means fewer punts and rejections, and more helpful answers across many topics.
In response to developer feedback, both models now have a more concise style, aimed at making these models easier to use and lower costs. For use cases such as summarization, question answering, and extraction, the default output length of the updated model is 5-20% shorter than the previous model. For chat-based products where users may prefer long responses by default, you can read our prompt strategy guide to learn more about how to make your models more verbose and conversational.
For more information on migrating to the latest versions of Gemini 1.5 Pro and 1.5 Flash, please visit the Gemini API model page.
gemini 1.5 pro
We continue to be amazed by the creative and useful applications of Gemini 1.5 Pro’s 2 million token-long context window and multimodal features. There are still many new use cases to build, from understanding videos to processing 1,000-page PDFs. Today, we are announcing a 64% price reduction on input tokens, a 52% price reduction on output tokens, and a 64% price reduction on incremental cash tokens for our most powerful 1.5 series model, Gemini 1.5 Pro, starting October 1, 2024. You will see tokens smaller than 128K. Combined with context caching, the cost of building with Gemini continues to be reduced.
Increased rate limit
To make it even easier for developers to build on Gemini, we’re increasing the rate limit for paid tiers in 1.5 Flash from 1,000 RPM to 2,000 RPM, and in 1.5 Pro from 1,000 RPM to 1,000 RPM. In the coming weeks, we will continue to increase rate limits on the Gemini API to help developers build even more features using Gemini.
2x faster output and 3x the latency
In addition to major improvements in our latest models, over the past few weeks we have reduced latency by 1.5 flashes, significantly increased output tokens per second, and enabled new use cases in our most powerful models.
Updated filter settings
Since Gemini’s first launch in December 2023, building a safe and reliable model has been a key focus. The latest version of Gemini (-002 model) improves the model’s ability to follow user instructions while balancing safety. We continue to provide a set of safety filters that developers can apply to Google’s models. In the model released today, no filters are applied by default to allow developers to determine the best configuration for their use case.
Gemini 1.5 Flash-8B Experimental Update
We are releasing “Gemini-1.5-Flash-8B-Exp-0924”, which is a further improvement on the Gemini 1.5 model announced in August. This improved version includes significant performance improvements for both text and multimodal use cases. Currently available via Google AI Studio and Gemini API.
The overwhelmingly positive feedback that developers have shared regarding 1.5 Flash-8B is incredible, and we continue to shape our experiment-to-product release pipeline based on developer feedback. Masu.
We’re excited about these updates and can’t wait to see what you build with the new Gemini models. Also, if you are a Gemini Advanced user, you will soon have access to a chat-optimized version of Gemini 1.5 Pro-002.