AI may feel ubiquitous, but it mainly operates in a small portion of the world’s 7,000 languages, leaving the majority of the world’s population. Nvidia aims to correct this obvious blind spot, especially within Europe.
The company has released a set of powerful open source tools aimed at providing developers with the power to build high quality speech AI in 25 different European languages. This includes major languages, but more importantly, it provides a lifeline for people whom big technology often overlooks, such as Croatian, Estonian, and Malta.
The goal is to enable developers to create voice-driven tools that many of us are the norm, from multilingual chatbots that actually understand you, to instantly functioning customer service bots and translation services.
The heart of this initiative is Granary, a huge library of human speech. It includes around a million hours of audio, all curated to help teach the nuances of speech recognition and translation.
To utilize this audio data, Nvidia also offers two new AI models designed for language tasks.
Canary-1B-V2, a large model built for high accuracy with complex transcription and translation jobs.parakeet-tdt-0.6b-v3. It is designed for real-time applications where speed is everything.
If you’re hoping to jump into the science behind it, your paper on Granary will be presented at a speech conference held in the Netherlands this month. The dataset and both models are already hugging their faces as developers want to get their hands dirty.
But the real magic lies in how this data was created. We all know that training AI requires a huge amount of data, but getting it is usually a slow, expensive, and candidly boring human annotation process.
To avoid this, we have collaborated with researchers from Nvidia’s Speech AI Team – Carnegie Mellon University and Fonda Gione Bruno Kessler to build an automated pipeline. Using our proprietary Nemo toolkit, we were able to collect raw, invalid audio and whip up high quality structured data that AI could learn.
This is more than just a technical achievement. It’s a huge leap in digital inclusiveness. This means that Riga or Zagreb developers can ultimately build voice-driven AI tools that understand the local language properly. And they can do it more efficiently. The research team found that granary data was very effective by the time the target was reached, compared to other common datasets.
Two new models demonstrate this power. Canaries are frankly beasts, offering translation and transcriptional quality that rivals model three times their size, but up to ten times faster. Paraquito, meanwhile, can bite a 24-minute meeting recording at once, automatically knowing which language is spoken. Both models are smart enough to handle punctuation, capitalization and provide the word-level timestamps needed to build professional grade applications.
By putting these powerful tools and the methods behind them in the hands of the global developer community, Nvidia isn’t just releasing products. It’s kickstarting a new wave of innovation, hoping that AI will create a world that speaks your language no matter where you come from.
(Photo: Aedrian Salazar)
See also: Deepseek returns to Nvidia for R2 model after Huawei AI chip fails
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber Security & Cloud Expo.
Check out other upcoming Enterprise Technology events and webinars with TechForge here.