the study
Published June 20, 2023
robocat team
The new Foundation agent learns various robotic arm operations, solves tasks from just 100 demonstrations, and improves from self-generated data.
Robots are rapidly becoming a part of our daily lives, but they are often only programmed to perform specific tasks well. Recent advances in AI have the potential to create robots that are useful in more ways, but progress in building general-purpose robots has been slow, in part because collecting real-world training data takes time. Masu.
Our latest paper introduces RoboCat, a self-improving AI agent for robotics. RoboCat learns how to perform different tasks across different arms and self-generates new training data to improve its technique.
Previous research has explored ways to develop robots that can learn to multitask at scale and combine language model understanding with the capabilities of real-world helper robots. RoboCat is the first agent to solve and adapt multiple tasks and perform it across different real robots.
RoboCat learns much faster than other state-of-the-art models. Because we leverage large and diverse datasets, we can discover new tasks in just 100 demonstrations. This feature reduces the need for human training, helping accelerate robotics research and is an important step toward creating general-purpose robots.
How RoboCat itself will be improved
RoboCat is based on the multimodal model Gato (Spanish for “cat”), which can process language, images, and actions in both simulated and physical environments. We combined Gato’s architecture with a large training dataset consisting of sequences of images and movements of different robotic arms solving hundreds of different tasks.
After this first training round, we introduced RoboCat to a “self-improvement” training cycle that involved a series of tasks it had never seen before. Learning each new task followed five steps:
Collect 100 to 1000 new task or robot demonstrations using a human-controlled robotic arm. Tweak RoboCat with this new task/arm to create specialized spin-off agents. The spin-off agent practices this new task/arm on average 10,000 times, generating more training data. Incorporate demonstration data and self-generated data into RoboCat’s existing training dataset. Train a new version of RoboCat. About the new training dataset.
RoboCat’s training cycle is enhanced by the ability to autonomously generate additional training data.
Combining all this training, the latest RoboCat is based on a dataset of millions of trajectories from both real and simulated robotic arms, including self-generated data. Masu. We used four different types of robots and many robotic arms to collect vision-based data representing the tasks that RoboCat is trained to perform.
RoboCat learns from different types of training data and tasks. Videos include a real robot arm picking up gears, a simulated arm stacking blocks, and RoboCat using a robot arm to pick up cucumbers.
Learn how to operate a new robotic arm and solve more complex tasks
With RoboCat’s diverse training, RoboCat learned how to operate various robotic arms within a few hours. Although it was trained on an arm with a two-finger gripper, it was able to adapt to a more complex arm with a three-finger gripper and twice as many controllable inputs.
Left: RoboCat, a new robotic arm that has learned to control
Right: Video of RoboCat using its arm to lift gear.
After observing 1000 human-controlled demonstrations collected in just a few hours, RoboCat was able to deftly manipulate this new arm to successfully pick up gear 86% of the time. With the same level of demonstration, they may be able to adapt to solving tasks that combine precision and comprehension, such as retrieving the correct fruit from a bowl or solving shape-matching puzzles, which are required for more complex control.
Examples of tasks that RoboCat can solve after 500 to 1000 demonstrations.
Generalist aiming for self-improvement
RoboCat has a virtuous cycle of training: the more it learns new tasks, the better it learns additional new tasks. An early version of RoboCat succeeded only 36% of the time on never-before-seen tasks after learning from 500 demonstrations per task. However, the latest RoboCat, trained on a more diverse set of tasks, more than doubled its success rate on the same tasks.
The big difference in performance between the initial RoboCat (one round of training) and the final version (extensive and diverse training, including self-improvement) is that both versions performed well on 500 demonstrations of never-before-seen tasks. After some fine-tuning.
These improvements are due to a broader range of RoboCat experiences, similar to how people develop more diverse skills as they learn more in a particular field. RoboCat’s ability to independently learn skills and rapidly self-improve, especially when applied to a variety of robotic devices, will help pave the way for a new generation of more useful general-purpose robotic agents.