Gemini 2.5 Native Audio upgrade and text-to-speech model update

Customer testimonials

Google Cloud customers are already using Gemini’s native audio capabilities to drive real business outcomes, from processing mortgages to calling customers.

“Users often forget they’re talking to an AI within a minute of using Sidekick, and in some cases, they even thank the bot after a long chat…The new Live API AI capabilities delivered through Gemini (2.5 Flash Native Audio) allow sellers to win.” – David Wurtz, VP of Products, Shopify Since its launch in May, Mia’s capabilities have been significantly enhanced. This powerful combination has enabled us to generate over 14,000 loans for our broker partners.” – Jason Bressler, Chief Technology Officer, United Wholesale Mortgage (UWM) “By working with the Gemini 2.5 Flash native audio model through Vertex AI, Receptionists can achieve unparalleled conversational intelligence: identify the main speaker in noisy environments, switch languages mid-conversation, and sound incredibly natural and expressive.” – David Yang, Co-Founder, Newo.ai.

live voice translation

Gemini now natively supports a new live voice-to-speech translation feature designed to handle both continuous listening and two-way conversation.

With continuous listening, Gemini automatically translates audio spoken in multiple languages into a single target language. This allows you to put on your headphones and hear the world around you in your own language.

For two-way conversations, Gemini’s Live Voice Translator processes translations between two languages in real-time and automatically switches the output language based on who is speaking. For example, if you speak English and want to chat with someone who speaks Hindi, you’ll hear the English translation in real time through your headphones, and when you’re done speaking, your phone will broadcast the Hindi.

Gemini’s live voice translation has many important features that are useful in the real world.

Language coverage: Gemini models’ world knowledge and multilingual capabilities, combined with native audio capabilities, translate audio in over 70 languages and over 2,000 language pairs. Style Transfer: Captures the nuances of human speech and preserves the speaker’s intonation, pace, and pitch, making translations sound natural. Multilingual input: Understand multiple languages simultaneously in one session and help you follow multilingual conversations without having to fiddle with language settings. Auto-detection: Starts by identifying the language being spoken. Noise Resistant: Eliminates ambient noise so you can have a comfortable conversation even in noisy outdoor environments.

versatileai

See Full Bio

What's Hot

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

OCR parameters for 50 languages from 1.5 million to 34.5 million

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

OCR parameters for 50 languages from 1.5 million to 34.5 million

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

Can research agents keep secrets?

Computer vision helps retailers improve productivity

Model Development Loop Evaluation Workbench

Most Popular

Can research agents keep secrets?

Computer vision helps retailers improve productivity

Model Development Loop Evaluation Workbench

Don't Miss

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

OCR parameters for 50 languages from 1.5 million to 34.5 million

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

Subscribe to Updates

What's Hot

Gemini 2.5 Native Audio upgrade and text-to-speech model update

Customer testimonials

live voice translation

Related Posts