Multimodal AI – a model that can handle different types of input, such as speech, text, images, and more – transforms the user experience in wearable spaces.
Using Ray-Ban Meta Meta, multimodal AI helps the glasses to see what the wearer is looking at. This means that anyone wearing Ray-Ban metaglasses can ask questions about what they are seeing. Glasses can provide information about landmarks, translate the text you are viewing, and many other features.
But what do you need to make AI into a wearable device?
In this episode of Meta Tech Podcast, Meta research scientist Shane spent time focusing on computer vision and multimodal AI for wearables. Shane and his team are behind cutting-edge AI research like Anymal, a unified language model that can infer an array of input signals that includes text, audio, video, and even IMU motion sensor data.
Shane sits with Pascal Haltig and shares how his team is building the basic model of Ray Vanmetagrass. They talk about the unique challenges of AI glasses and pushing the boundaries of AI-driven wearable technology.
Whether you’re an engineer, a tech enthusiast, or simply curious, there’s something about this episode for everyone!
Download or listen to the following episodes:
You can also find episodes by getting a podcast like this:
Meta Tech Podcast is a podcast and was brought to you by Meta. Here we highlight the work that Meta engineers do at all levels. From low-level frameworks to end-user features.
Send feedback about Instagram, threads, or X.
Also, if you’re interested in learning more about career opportunities on Meta’s Meta Career Page.
link
Timestamp
Intro 0:06 OSS News 0:56 Introduction Shane 1:30 Over time the role of research scientists 3:03 What is multimodal AI? 5:45 Applying Multimodal AI to Meta Products 7:21 Acoustic Modality Beyond Speech 9:17 Anymal 12:23 Encoder Zoos 13:53 0 Shot Performance 16:25 Model 17:28 LLM Parameter Size 19:29 21:53 Moving Image Processing 23:44 Scaling to Billions of Users 26:01 What are the possibilities for optimization? 28:12 Feedback built-in 29:08 Impact of open source