the study
Author published on August 31, 2022
Siqi Liu, Leonard Hasenclever, Steven Bohez, Guy Lever, Zhe Wang, SM Ali Eslami, Nicolas Heess
Simulated humanoid characters carry boxes and play soccer to teach robots to dribble balls using human and animal movements
Humanoid characters learn to go through obstacle courses through trial and error, which can lead to singular solutions. Heess, et al. “The emergence of motor behavior in an abundance environment” (2017).
Five years ago, we took on the challenge of teaching completely clear humanoid characters to cross obstacle courses. This demonstrated that reinforcement learning (RL) can be achieved through trial and error, but also highlighted two challenges in solving embodied intelligence.
Reuse of previously learned behaviors: Agents needed a considerable amount of data to “get off the ground.” Without initial knowledge of what forces to apply to each joint, the agent began with a random body convulsion and quickly fell to the ground. This problem can be alleviated by reusing previously learned behaviors. idiosyncratic behavior: When the agent finally learned to navigate obstacle courses, he did so with unnatural (funny) motion patterns that are unrealistic in applications such as robots.
Here we describe solutions to both challenges called Neurostochastic Motion Primitives (NPMP), including guided learning with motor patterns derived from humans and animals, and explain how this approach is used in humanoid soccer papers published in Scientific and Robotics today.
We also explain how this same approach allows for humanoid full-body manipulation from vision, such as humanoids carrying objects, and real-world robot control, such as robots that dribble the ball.
Distill data into controllable motor primitives using NPMP
NPMP is a general purpose motor control module that converts short-range motor intent into low-level control signals, trained by recording human or animal trackers offline or via RL, by mimicking motion capture (MOCAP) data.
Agents learning to mimic MOCAP trajectories (shown in grey).
The model has two parts.
An encoder that takes a future trajectory and compresses it into motion intention. A low-level controller that generates the next action, taking into account the current status of the agent and this movement intent.
The NPMP model first distills the reference data to a low-level controller (left). This low-level controller can be used as a plug-and-play motor control module for the new task (right).
After training, you can reuse the low-level controller to learn new tasks. In this task, a high-level controller is optimized to output the motor’s intentions directly. This allows efficient exploration and constrain the final solution by generating coherent behavior even when using randomly sampled motor intents.
Humanoid Football Emergency Team Coordination
Football has been a long-standing challenge for embodied intelligence research, requiring individual skills and coordinated team play. In our latest work, we used NPMP before guiding motor skills learning.
The result was a team of players who advanced from learning the skills to chasing the ball to learning to ultimately adjust. Previously, studies with simple embodiments showed that collaborative behavior could emerge in teams competing with each other. NPMP allowed us to observe similar effects, but in scenarios that require more advanced motor control.
Agents first learn the NPMP module (above) by mimicking the movements of soccer players. Using NPMP, agents learn soccer-specific skills (bottom).
Our agents have acquired skills such as agile movement, passing, and division of labor, as demonstrated in various statistics, including metrics used in real-world sports analyses. Players demonstrate both agile high-frequency motion control and long-term decision-making with expecting teammates’ actions, leading to coordinated team play.
Agents learning to play competitive soccer using multi-agent RL.
Whole-body manipulation and cognitive tasks using visuals
Learning to interact with objects using arms is another difficult control challenge. NPMP can also allow this type of whole body operation. A small amount of MOCAP data about interaction with boxes allows agents to train them to transport boxes from one location to another.
With a small amount of MOCAP data (TOP), the NPMP approach can solve the task carrying the box (below).
Similarly, you can teach your agents to catch and throw the ball.
Simulated humanoid catch and throw the ball.
You can also use NPMP to tackle maze tasks that include movement, perception, and memory.
Simulated humanoid collecting blue spheres into a maze.
Safe and efficient control of real-world robots
NPMP also helps to control the actual robot. It is important to take well-normalized behavior for activities such as walking through rough terrain or dealing with vulnerable objects. Unstable movements can damage the robot itself or its surroundings. At least drain the battery. Therefore, great efforts will be invested in designing learning goals that will ensure that the robots do what we want to do with the robots while acting in a safe and efficient way.
Alternatively, we investigated whether using a prey derived from biological motion would provide well-normalized, natural-looking, reusable movement skills for legged robots, such as maneuvering, running, and spinning, suitable for deployment into real-world robots.
Starting with MOCAP data from humans and dogs, we can adapt the NPMP approach to train skills and controllers in simulations, deploying them to Real Humanoid (OP3) and Squared (Anymal B) robots, respectively. This allowed the robot to be maneuvered by the user via a joystick, or dribble the ball to the target location.
Anymal Robot’s mobility skills are learned by mimicking a dog’s MoCap.
You can then reuse your moving skills for controllable walking and ball dribbling.
Benefits of using neural stochastic motor primitives
In summary, we used NPMP skill models to learn complex tasks using humanoid characters in simulations and real robots. NPMP packages low-level movement skills in reusable ways to facilitate useful actions that are difficult to discover through unstructured trial and error. Using motion capture as a source of advance information biases towards motor control learning for naturalistic movements.
NPMP allows embodied agents to learn more quickly using RL. To learn more naturalistic behaviour. To learn safer, more efficient and stable movements suitable for real robotics. And combine full-body motor control with longer horizon cognitive skills such as teamwork and coordination.
Find out more about our work: