Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Trends and insights with new multilingual and long-form tracks

November 22, 2025

ChatGPT group chats can help teams bring AI to their daily planning

November 21, 2025

One API for local and remote LLM on Apple platforms

November 21, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Saturday, November 22
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Trends and insights with new multilingual and long-form tracks
Tools

Trends and insights with new multilingual and long-form tracks

versatileaiBy versatileaiNovember 22, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

While everyone (and their grandma πŸ‘΅) is launching new ASR models, choosing the right ASR model for your use case can feel more difficult than choosing the next Netflix show. As of November 21, 2025, the hub has 150 Audio-Text-to-Text models and 27,000 ASR models 🀯

Most benchmarks focus on short-form English transcription, ignoring other important tasks such as (1) multilingual performance and (2) model throughput, which can be determining factors for long-form audio such as conferences and podcasts.

Over the past two years, the Open ASR Leaderboard has become the standard for comparing open and closed source models in both accuracy and efficiency. Multilingual and long-form transcription tracks were recently added to the leaderboard πŸŽ‰

TL;DR – Open ASR Leaderboard

πŸ“ New preprint on ASR trends from Leaderboard: https://hf.co/papers/2510.06961 🧠 Best accuracy: Conformer encoder + LLM decoder (open source ftw πŸ₯³) ⚑ Fastest: CTC / TDT decoder 🌍 Multilingual: Single language performance sacrificed βŒ› Long form: Still closed source System Lead (for now πŸ˜‰) πŸ§‘β€πŸ’» Tweaking Guide (Parakeet, Voxtral, Whisper): To continue improving performance

As of November 21, 2025, the Open ASR Leaderboard compares over 60 open source and closed source models from 18 organizations across 11 datasets.

A recent preprint details the technical setup and highlights some important trends in modern ASR. Here are some important points πŸ‘‡

1. Conformer encoder 🀝 LLM decoder tops the charts πŸ“ˆ

Currently, a model that combines a Conformer encoder and a Large Language Model (LLM) decoder leads in English transcription accuracy. For example, NVIDIA’s Canary-Qwen-2.5B, IBM’s Granite-Speech-3.3-8B, and Microsoft’s Phi-4-Multimodal-Instruct achieve the lowest word error rate (WER), demonstrating that integrating LLM inference can significantly improve ASR accuracy.

πŸ’‘ Pro Tip: NVIDIA has introduced Fast Conformer, a 2x faster variant of Conformer. It is used in the Canary and Parakeet model suites.

2. Speed-accuracy trade-off βš–οΈ

thumbnail

Although these LLM decoders are more accurate, they tend to be slower than naive approaches. In the Open ASR Leaderboard, efficiency is measured using the Reverse Real-Time Factor (RTFx), where higher is better.

To achieve even faster inference, CTC and TDT decoders deliver 10 to 100 times faster throughput at slightly higher error rates. This makes it ideal for real-time, offline, or batch transcription tasks (meetings, lectures, podcasts, etc.).

3. Multilingual 🌍

thumbnail

OpenAI’s Whisper Large v3 remains a powerful multilingual baseline, supporting 99 languages. However, tweaked or distilled variants such as Distil-Whisper and CrisperWhisper often perform better than the originals on English-only tasks, demonstrating how targeted tweaking can improve your expertise (How to Tweak? Check out our guides to Whisper, Parakeet, and Voxtral).

That said, a focus on English tends to reduce multilingual coverage πŸ‘‰ This is a classic case of the trade-off between specialization and generalization. Similarly, self-monitoring systems such as Meta’s Massively Multilingual Speech (MMS) and Omnilingual ASR can support over 1,000 languages, but lag behind language-specific encoders in accuracy.

⭐ Only five languages ​​are currently benchmarked, but we plan to expand to more languages ​​and look forward to contributing new datasets and models to multilingual ASR via GitHub pull requests.

🎯 Alongside multilingual benchmarks, several community-driven leaderboards focus on individual languages. For example, the Open Universal Arabic ASR Leaderboard compares models across Modern Standard Arabic and regional dialects, highlighting how phonetic variation and bilingualism pose challenges to current systems. Similarly. The Russian ASR Leaderboard provides a growing hub for evaluating encoder/decoder and CTC models for Russian-specific phonology and morphology. These localized efforts reflect the broader multilingual leaderboard mission to facilitate dataset sharing, fine-tuned checkpoints, and transparent model comparisons, especially in languages ​​with fewer established ASR resources.

4. Transcribing long texts is a different game ⏳

thumbnail

For long-form audio (podcasts, lectures, conferences, etc.), closed-source systems still outperform open systems. This could be due to domain tuning, custom chunking, or production-level optimization.

Among the open models, OpenAI’s Whisper Large v3 has the best performance. But when it comes to throughput, CTC-based Conformers are better πŸ‘‰ For example, NVIDIA’s Parakeet CTC 1.1B achieves an RTFx of 2793.75 compared to 68.56 for Whisper Large v3, with only a moderate reduction in WER (6.68 and 6.43, respectively).

What are the trade-offs? Parakeet is English-only, but once again reminds us of the trade-off between multilingualism and expertise 🫠.

⭐ Closed systems still lead the way, but there is great potential for open source innovation here. Long-form ASR is one of the next most exciting frontiers for the community to tackle.

Given how quickly ASR is evolving, we are excited to see what new architectures improve performance and efficiency, and how the Open ASR Leaderboard continues to serve as a transparent, community-driven benchmark in this space and a reference for other leaderboards (Russian, Arabic, and audio deepfake detection).

Stay tuned as we continue to expand Open ASR LeaderBoard with more models, more languages, and more datasets πŸ‘€

πŸ‘‰ Want to contribute? Go to our GitHub repository and open a pull request πŸš€

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleChatGPT group chats can help teams bring AI to their daily planning
versatileai

Related Posts

Tools

ChatGPT group chats can help teams bring AI to their daily planning

November 21, 2025
Tools

One API for local and remote LLM on Apple platforms

November 21, 2025
Tools

How to choose the best thermal binoculars for long-range detection in 2026

November 20, 2025
Add A Comment

Comments are closed.

Top Posts

Try generating videos on Gemini with VEO 2

April 16, 20255 Views

Paris AI Safety Breakfast #4: Rumman Chowdhury

February 13, 20255 Views

Paris AI Safety Breakfast #3: Yoshua Bengio

February 13, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Try generating videos on Gemini with VEO 2

April 16, 20255 Views

Paris AI Safety Breakfast #4: Rumman Chowdhury

February 13, 20255 Views

Paris AI Safety Breakfast #3: Yoshua Bengio

February 13, 20255 Views
Don't Miss

Trends and insights with new multilingual and long-form tracks

November 22, 2025

ChatGPT group chats can help teams bring AI to their daily planning

November 21, 2025

One API for local and remote LLM on Apple platforms

November 21, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?