the study
Published December 8, 2023
Aiming for a more multimodal, robust, and versatile AI system
Next week kicks off the world’s largest artificial intelligence (AI) conference, the 37th Annual Conference on Neural Information Processing Systems (NeurIPS). NeurIPS 2023 will be held from December 10th to 16th in New Orleans, USA.
Teams across Google DeepMind have presented more than 180 papers at major conferences and workshops.
We will demonstrate state-of-the-art AI models for global weather forecasting, material discovery, and watermarking of AI-generated content. You’ll also have the chance to hear from the team behind Gemini, our largest and most capable AI model.
Here are some highlights of our research.
Multimodality: language, video, action
UniSim is a universal simulator of real-world interactions.
Generative AI models can create paintings, compose music, write stories, and more. However, no matter how competent these models are in one medium, most have a hard time transferring their skills to another medium. We delve into how generative abilities can aid learning across different modalities. In our spotlight presentation, we demonstrate that diffusion models can be used to classify images without the need for additional training. Diffusion models like Imagen classify images in a more human-like way than other models, based on shape rather than texture. Additionally, we show how simply predicting captions from images can improve computer vision learning. Our approach demonstrated the potential to outperform and extend current methods for visual and language tasks.
As more multimodal models become widespread, we may see more useful digital and robotic assistants assisting people in their daily lives. Spotlight Poster creates agents that can interact with the digital world in the same way humans do, through screenshots and keyboard and mouse interactions. Separately, by leveraging video generation including subtitles and closed captions, we show that the model can transfer knowledge by predicting video plans of real robot movements.
One of the next milestones could be generating realistic experiences in response to actions performed by humans, robots, and other types of interactive agents. Introducing a demo of UniSim, a universal simulator of real-world interactions. This type of technology could have applications in a variety of industries, from video games and movies to real-world agent training.
Building safe and understandable AI
Artist’s illustration of artificial intelligence (AI). This image shows AI safety research. It was created by artist Khyati Trehan as part of the Visualizing AI project launched by Google DeepMind.
When developing and deploying models at scale, privacy must be built in at every step.
In the paper that won the NeurIPS Best Paper Award, our researchers demonstrate how to evaluate privacy-preserving training using a methodology efficient enough for real-world use. For training, our team is researching ways to measure whether language models remember data to protect private and sensitive material. In another oral presentation, our scientists explore the limits of training through the model of “student” and “teacher” with different levels of access and vulnerability if attacked.
Although large-scale language models can produce impressive answers, they are prone to “hallucinations,” or text that appears to be correct but is fabricated. Our researchers raise the question of whether the way facts are located (localization) allows them to be edited. Surprisingly, they found that fact localization and location editing did not edit facts. This suggests the complexity of understanding and controlling the information stored in the LLM. We propose a new way to evaluate interpretability methods by using Tracr to convert human-readable programs into transformer models. We open sourced a version of Tracr to serve as the ground truth for evaluating interpretability methods.
emergent ability
Artist’s illustration of artificial intelligence (AI). This image represents artificial general intelligence (AGI). It was created by Novoto Studio as part of the Visualizing AI project launched by Google DeepMind.
As the power of large-scale models improves, our research is pushing the boundaries of new capabilities for developing more general AI systems.
Language models are used for common tasks, but lack the exploratory and contextual understanding needed to solve more complex problems. We introduce Tree of Thoughts, a new framework for language model inference that helps models explore and reason about a wide range of possible solutions. We demonstrate that language models can more accurately solve complex tasks like Game 24 by organizing reasoning and planning as trees rather than the commonly used flat chains of thought.
To help people solve problems and find what they’re looking for, AI models need to efficiently process billions of unique values. Feature multiplexing allows large scale embedding models (LEMs) to scale into products for billions of users by using one single representation space for different functions.
Finally, we show how DoReMi can be used to significantly speed up language model training and improve performance on new and unknown tasks by using AI to automate the mixing of training data types.
Cultivating a global AI community
We are proud to sponsor NeurIPS and support workshops led by LatinX in AI, QueerInAI, and Women In ML to foster research collaboration and support the development of a diverse AI and machine learning community. This year, NeurIPS is offering a creative track featuring Visualizing AI projects that commission artists to create more diverse and accessible representations of AI.
If you’re attending NeurIPS, stop by our booth to learn more about our cutting-edge research and meet our team hosting workshops and presenting throughout the conference.