the study
Published July 19, 2024
Explore AGI, scaling challenges, and the future of multimodal generative AI
Next week, the artificial intelligence (AI) community will gather for the 2024 International Conference on Machine Learning (ICML). The conference, which will be held in Vienna, Austria from July 21 to 27, is an international platform to showcase the latest advances, exchange ideas, and shape the future of AI research.
This year, teams across Google DeepMind will publish more than 80 research papers. At our booth, we will also be showcasing our multimodal on-device model Gemini Nano, a new family of educational AI models called LearnLM, and we will also be demonstrating TacticAI, an AI assistant that helps with soccer tactics.
Here we will introduce some of the oral presentations, spotlight presentations, and poster presentations.
Defining the path to AGI
What is artificial general intelligence (AGI)? This phrase refers to AI systems that are at least as capable as humans at most tasks. As AI models continue to evolve, it becomes increasingly important to define what AGI actually is.
We present a framework for classifying the features and behaviors of AGI models. Our paper classifies systems, ranging from non-AI calculators to emerging AI models and other new technologies, according to their performance, versatility, and autonomy.
We also show that open-endedness is important for building general-purpose AI that exceeds human capabilities. While many recent advances in AI have been driven by existing internet-scale data, open-ended systems can generate new discoveries that extend human knowledge.
At ICML, we will be demonstrating Genie, a model that can generate a variety of playable environments based on text prompts, images, photos, and sketches.
Scale AI systems efficiently and responsibly
Developing larger, more capable AI models requires more efficient training methods, tuning to human preferences, and better privacy protections.
We show how using classification instead of regression techniques makes it easier to scale deep reinforcement learning systems and achieve state-of-the-art performance across a variety of domains. Furthermore, we propose a novel approach that helps predict the distribution of outcomes of reinforcement learning agent actions and quickly evaluate new scenarios.
Our researchers present a coordination-maintenance approach that reduces the need for human supervision and a novel approach to fine-tuning large-scale language models (LLMs) based on game theory to make the output of LLMs more human-friendly. Adjust to your liking.
We criticize the approach of training models on public data and fine-tuning them only with “differentially private” training, arguing that this approach may not provide the privacy or utility that is often claimed. claim.
VideoPoet is a large-scale language model for zero-shot video generation.
New approaches in generative AI and multimodality
Generative AI technology and multimodal capabilities expand the creative potential of digital media.
Here we introduce VideoPoet, which uses LLM to generate state-of-the-art video and audio from multimodal inputs including images, text, audio, and other videos.
We also share Genie, a generative interactive environment that allows you to generate different playable environments to train your AI agents based on text prompts, images, photos, or sketches.
Finally, we introduce MagicLens, a novel image retrieval system that uses text instructions to retrieve images with richer relationships beyond visual similarity.
AI community support
We are proud to sponsor ICML and foster a diverse community in AI and machine learning by supporting initiatives led by disability in AI, queer in AI, LatinX in AI, and women in machine learning. Masu.
If you’re at the conference, visit the Google DeepMind and Google Research booths to meet our teams, watch live demos, and learn more about our research.