New features in Gemini 2.5
Native audio output and live API improvements
The Live API now introduces preview versions of audiovisual input and native voice output dialogs, allowing you to build more natural and expressive conversational experiences directly with Gemini.
Users can also adjust their tone, accent, and speaking style. For example, you can tell your model to use a dramatic voice when telling a story. We also support the use of tools and allow us to search on your behalf.
You can try out an initial set of features, including:
Emotional dialogue, where a model detects emotions from the user’s voice and responds appropriately. Proactive audio that lets models ignore background conversations and know when to respond. Thinking with a live API where models leverage Gemini’s thinking capabilities to support more complex tasks.
We’re also releasing a new preview of text-to-speech for 2.5 Pro and 2.5 Flash. For the first time, they support multiple speakers and enable two-voice text-to-speech via native audio output.
Like native audio dialogue, text-to-speech is expressive and can capture very subtle nuances, such as whispers. Works with over 24 languages and seamlessly switches between them.

