New Gemini 2.5 Features
Native audio output and live API improvements
Today, Live APIs introduce preview versions of audiovisual input and native audio out dialogs, allowing you to directly build conversational experiences with more natural and expressive Gemini.
It also allows users to manipulate tones, accents and speech styles. For example, you can instruct your model to use dramatic voices when telling stories. It also supports the use of the tool and allows you to search for it on your behalf.
You can try out a set of early features including:
An emotional dialogue in which the model detects and responds appropriately to the user’s voice emotions. In ProActual Audio, models can ignore background conversations and know when to respond. The idea in the live API utilizes Gemini’s thinking capabilities to help the model support more complex tasks.
We are also releasing new previews of text-to-speech in 2.5 Pro and 2.5 Flash. These have initial support for multiple speakers, allowing speech from text using two voices via native audio out.
Like native audio dialogs, text-to-speech is expressive and can capture very subtle nuances such as whispers. Works in over 24 languages and seamlessly switch between them.