the study
Published May 3, 2024
Developing next-generation AI agents, exploring new modalities, and pioneering basic learning
Next week, AI researchers from around the world will gather for the 12th International Conference on Learning Representations (ICLR), which will be held from May 7th to 11th in Vienna, Austria.
Raia Hadsell, VP of Research at Google DeepMind, gave a keynote address reflecting on the past 20 years in the field, highlighting how lessons learned are shaping the future of AI for the benefit of humanity. Masu.
We will also provide live demonstrations of how we turn basic research into reality, from developing robotics transformers to creating toolkits and open source models like Gemma.
This year, teams across Google DeepMind will publish more than 70 papers. Some research highlights:
Problem-solving agents and human-driven approaches
Large-scale language models (LLMs) are already revolutionizing advanced AI tools, but their full potential has yet to be exploited. For example, LLM-based AI agents that can perform effective actions can turn digital assistants into more useful and intuitive AI tools.
AI assistants that follow natural language instructions and perform web-based tasks on your behalf save you a lot of time. The oral presentation will introduce WebAgent, an LLM-driven agent that learns from self-experience to navigate and manage complex tasks on real-world websites.
To further enhance the general utility of the LLM, we focused on developing problem-solving skills. We show how we achieved this by equipping an LLM-based system with a traditional human approach: the creation and use of “tools.” Separately, we introduce training techniques that ensure that language models produce more consistent and socially acceptable output. Our approach uses a sandbox rehearsal space that represents society’s values.
Pushing the boundaries of vision and coding
Our Dynamic Scene Transformer (DyST) model leverages real-world, single-camera video to extract 3D representations of objects and their movements in a scene.
Until recently, large-scale AI models have primarily focused on text and images, laying the foundation for large-scale pattern recognition and data interpretation. The field is now progressing beyond these static realms to embrace the dynamics of real-world visual environments. As computing advances across the board, it becomes increasingly important that the underlying code is generated and optimized for maximum efficiency.
When you watch a video on a flat screen, you get an intuitive sense of the three-dimensional nature of the scene. However, machines have a hard time emulating this ability without explicit supervision. We introduce a Dynamic Scene Transformer (DyST) model that leverages real-world single-camera video to extract 3D representations of objects and their movements in a scene. In addition, DyST also allows users to generate new versions of the same video with user control over camera angles and content.
Emulating human cognitive strategies also creates better AI code generators. When programmers write complex code, they typically “decompose” the task into simpler subtasks. ExeDec introduces a new code generation approach that leverages a decomposition approach to improve the programming and generalization performance of AI systems.
A parallel spotlight paper explores new ways to use machine learning to not only generate code but also optimize it, and introduces datasets for robust benchmarking of code performance. Optimizing code is difficult and requires complex inference, but our dataset allows exploration of various ML techniques. We demonstrate that the resulting learning strategy outperforms human-written code optimizations.
ExeDec introduces a new code generation approach that leverages a decomposition approach to improve programming and generalization performance of AI systems.
Promotion of basic learning
Our research team is tackling big questions in AI, from exploring the nature of machine cognition to understanding how advanced AI models generalize, while also overcoming major theoretical challenges. We are also working on
For both humans and machines, causal inference and the ability to predict events are closely related concepts. This spotlight presentation explores how reinforcement learning is affected by prediction-based training goals and draws parallels with changes in brain activity that are also associated with prediction.
Are AI agents able to generalize well to new scenarios because, like humans, they are learning the underlying causal models of their world? This is an important question for advanced AI. In an oral presentation, we will demonstrate that such models are actually learning approximate causal models of the processes that gave rise to the training data, and discuss their deeper implications.
Another important issue in AI is trust. Reliability depends in part on how accurately a model can estimate the uncertainty in its output. This is a key element for reliable decision making. We employ a simple and essentially cost-free method and have made significant progress in uncertainty estimation in Bayesian deep learning.
Finally, we explore the Nash equilibrium (NE) in game theory, a situation in which no player can benefit from changing their strategy if the other players maintain their strategy. Even approximating a Nash equilibrium beyond a simple two-player game is computationally difficult, but the oral presentation reveals a new, cutting-edge approach to negotiating deals from poker to auctions.
Unite the AI community
We are pleased to sponsor ICLR and support initiatives such as Queer in AI and Women in Machine Learning. Such partnerships not only strengthen research collaboration but also foster a vibrant and diverse community in AI and machine learning.
If you’re at ICLR, be sure to stop by our booth and our colleagues at Google Research next door. Discover our pioneering research, meet our teams hosting workshops, and connect with our experts presenting throughout the conference. I look forward to connecting with you!