Google has given diagnostic AI the ability to understand visual medical information with its latest research on AMIE (Articulate Medical Intelligence Explorer).
Imagine chatting with AI about health concerns. Instead of simply processing your words, you can actually see the photos of that worrying rash and understand the printed ECG. That’s what Google is aiming for.
I already knew Amie had made a promise through a text-based medical chat thanks to her previous work published in Nature. But let’s face it, words aren’t the only real medicine.
Doctors rely heavily on what they can see, such as skin condition, measurements from machines. As the Google team correctly points out, even simple instant messaging platforms “enable static multimodal information (e.g. images and documents) to enrich the debate.”
Text-only AI didn’t have a huge piece of puzzle. As the researchers said, the big question was “whether LLM can conduct diagnostic clinical conversations that incorporate this more complex type of information.”
Google tells Amie what it looks and why
Google’s engineers have bolstered Amie using the Gemini 2.0 Flash model as the brain of manipulation. They combined this with what they call the “National Perception Inference Framework.” In plain English, this means that AI doesn’t just follow the script. Adapt the conversation based on what you’ve learned so far and what you still need to understand.
It’s close to how human clinicians work. Gather clues, form ideas about what’s going wrong, and narrow things down for more specific information, including visual evidence.
“This allows Amie to request relevant multimodal artifacts when needed, accurately interpret the findings, seamlessly integrate this information into ongoing dialogue, and use it to improve diagnostics,” explains Google.
Think of conversations that flow through stages. First, we gather patient history, then follow-up, heading towards diagnosis and management suggestions. AI constantly evaluates one’s understanding and seeks skin photos and lab results if they feel that knowledge gap.
Google has built a detailed simulation lab to do this right without endless trial and error on real people.
Google has derived realistic medical images and data from sources such as the PTB-XL ECG database and the Scin Dermatology Image Set, and has added plausible backstory using Gemini. Then, within this setup, Amie will “chat” with the simulated patients and automatically see how well it worked, such as by avoiding diagnostic accuracy and errors (or “hagaze”).
Virtual OSCE: Google lets Amie pass that pace
Actual tests were performed in a setup designed to reflect the way medical students evaluate: objective structured clinical tests (OSCE).
Google ran a remote study that included 105 different medical scenarios. Actual actors trained to consistently portray patients interacted with new multimodal Amie or real human primary care physicians (PCPs). These chats were made through an interface where “patients” could upload images, like modern messaging apps.
The conversation was then reviewed by a specialist doctor (dermatology, cardiology, internal medicine) and the patient actor himself.
Human physicians have acquired everything from how well history was received, the accuracy of the diagnosis, the quality of the proposed management plan, communication skills and empathy, and of course, how much AI interpreted visual information.
Surprising results of a simulated clinic
This is where it really gets interesting. In this direct comparison within a controlled research environment, Google discovered that Amie did not hold its own.
AI was rated superior to human PCP in interpreting multimodal data shared during chat. We also created a differential diagnostic list (ranking list of possible conditions) that scored high scores with diagnostic accuracy and determined to be more accurate and complete based on case details.
Specialist physicians reviewing transcripts tended to appreciate Amie’s performance in most areas. They particularly noted the “quality of image interpretation and inference,” thoroughness of diagnostic work-ups, the soundness of management plans, and the ability to flag when the situation requires urgent attention.
Perhaps one of the most surprising findings comes from the patient actor. They often found that in these text-based interactions AI is more empathetic and reliable than human physicians.
Also, in the important safety notes, in this study, no statistically significant differences were found between Amie’s frequency of making errors based on images (haptic findings) compared to human physicians.
Since the technology is not yet present, Google also ran some early tests replacing the Gemini 2.0 Flash model of the new Gemini 2.5 flash.
Using the simulation framework, results suggest further benefits, particularly by correctly diagnosing (top 3 accuracy), suggesting appropriate management plans.
I promise, but the team will soon add realism. These are automated results, and “a rigorous assessment with a professional physician review is essential to see the benefits of these performance.”
A check of important reality
Google is worthy of praise for the restrictions here. “This study explores a research-only system with OSCE-style assessments using patient stakeholders, which significantly underestimates the complexity of actual care.”
The simulated scenario, however well designed, is not the same as dealing with the unique complexity of real patients in a busy clinic. They also emphasize that the chat interface does not capture the richness of real videos or face-to-face consultations.
So, what is the next step? Move carefully towards the real world. Google has already partnered for research studies at Beth Israel Deaconess Medical Center to find out how Amie works in a real clinical setting with patient consent.
Researchers also acknowledge the need to ultimately move beyond text and static images towards real-time video and audio processing. This is a common interaction in telehealth today.
The ability to give AI the ability to “see” and interpret the types of visual evidence that doctors use every day gives us a glimpse into how AI will one day help clinicians and patients. However, from these promising findings, the path to safe and reliable tools for everyday healthcare is a long tool that requires careful navigation.
(Photo: Alexander Singh)
See: Are AI chatbots really changing the world of work?
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber Security & Cloud Expo.
Check out other upcoming Enterprise Technology events and webinars with TechForge here.