Google developed an AI model called Dolphingemma to decipher how dolphins communicate, and for the day promoted interspecies communication.
The intricate clicks, whistles and pulses that echo through the underwater world of dolphins have long fascinated scientists. My dream was to understand and decipher patterns within complex vocalizations.
Google has collaborated with engineers at Georgia Tech to leverage field research from the Wild Dolphin Project (WDP) to present Ilfingenma to realize its goals.
The basic AI model, published around National Dolphin Day, represents a new tool for understanding Cotacean Communication. Specially trained to learn the structure of dolphin sounds, Ilfingenma can even generate novel dolphin-like audio sequences.
For decades, the Wild Dolphin Project, which is operational since 1985, has conducted the world’s longest and most continuous underwater research of dolphins in order to develop a deeper understanding of context-specific sounds, such as:
Signature “Whistle”: acts as a unique identifier similar to the name, and is important for mother-like interactions reuniting with the calf. Burst pulse “Scoke”: commonly associated with conflict or positive encounters. Click Buzz. It is often detected during courtship or when dolphins chase after a shark.
The ultimate goal of WDP is to uncover the inherent structures and potential meanings within these natural sound sequences, and to search for grammatical rules and patterns that mean the form of language.
This long-term, laborious analysis has made essential foundations and labeled data important for training sophisticated AI models such as Dolphingemma.
Ilfingenma: Ai ears of cetacean sounds
Analyzing the enormous volume and complexity of dolphin communication is an ideal, frightening task for AI.
Dolphingemma, developed by Google, uses specialized audio technology to tackle this. The SoundStream Tonizer is used to efficiently represent dolphin sounds and feed this data into a model architecture that is familiar to processing complex sequences.
Based on insights from Google’s Gemma family of lightweight, open models (sharing technology with powerful Gemini models), Dolphingemma acts as an audio-in, audio-out system.
Dolphingemma, given a sequence of natural dolphin sounds from WDP’s extensive database, learns to identify repeating patterns and structures. Importantly, sequences can predict subsequent sounds just as human language models predict the next word.
With around 400 million parameters, Dolphingemma is optimized to run efficiently, whether it’s a Google Pixel smartphone or WDP is using for field data collection.
As WDP begins rolling out its models this season, it promises to significantly accelerate research. By automatically flagging patterns and reliable sequences that require human effort to discover previously, it helps researchers to uncover hidden structures and potential meanings within dolphin natural communication.
Two-way interaction with chat systems
Dolphingemma focuses on understanding natural communication, but in parallel projects, we explore another path of active two-way interaction.
The Cetacean Hearing Augmentation Telemetry system, developed by WDP in collaboration with Georgia Tech, aims to establish a simpler, shared vocabulary rather than directly translate complex dolphin words.
This concept relies on its association with concrete and novel synthetic whistles (created by chat and not natural sounds). Researchers demonstrate links to whistle objects, hoping that the dolphin’s natural curiosity will lead them to mimic sounds to request items.
These could be incorporated into chat interaction frameworks, as more natural dolphin sounds are understood throughout the work using models like Ilfingenma.
Google Pixel enables ocean research
Supporting both natural sound analysis and interactive chat systems is a key mobile technology. Google Pixel phones act as the brain for processing high-fidelity audio data in real time, directly in challenging marine environments.
For example, chat systems rely on Google Pixel phones.
Detects potential mimics in background noise. Identify the specific whistle being used. Warn researchers (via underwater bone conduction headphones) about dolphin “requests.”
This allows researchers to respond quickly with the correct objects and strengthen the learned associations. Pixel 6 handled this first, but the next generation chat system (planned for the summer of 2025) utilizes pixel 9, integrates speaker/microphone functionality, running both deep learning models and template matching algorithms simultaneously to enhance performance.
Using a smartphone like pixels dramatically reduces the need for bulky, expensive custom hardware. Improves system maintenance, lowers power requirements, and reduces physical size. Furthermore, Dolphingemma’s predictive power integrated into chat helps identify mimics faster and make interactions more fluid and effective.
Recognizing that breakthroughs often stem from collaboration, Google plans to release Dolphingemma as an open model later this summer. While being trained with spotted Atlantic dolphins, its architecture is promising for researchers studying other temporary patients, potentially requiring fine-tuning of the voice repertoire of different species.
The aim is to equip researchers with powerful tools for analyzing their own acoustic data sets and accelerate their collective efforts to understand these intelligent marine mammals. We are changing from passive listening towards patterns of positive deciphering, which offers the potential to bridge the gaps in communication between our species.
Reference: IEA: AI opportunities and challenges for global energy

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber Security & Cloud Expo.
Check out other upcoming Enterprise Technology events and webinars with TechForge here.