Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Introducing NVIDIA Cosmos Policies for Advanced Robot Control

January 29, 2026

AI agent for code security — Google DeepMind

January 29, 2026

Artificial intelligence (AI) transformation (AX) and generative AI are attracting attention these days.

January 29, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Thursday, January 29
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Introducing NVIDIA Cosmos Policies for Advanced Robot Control
Tools

Introducing NVIDIA Cosmos Policies for Advanced Robot Control

versatileaiBy versatileaiJanuary 29, 2026No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are continually extending the NVIDIA Cosmos™ World Foundation Model (WFM) to tackle some of the toughest problems in robotics, self-driving vehicle development, and industrial vision AI.

To further support this effort, we present Cosmos Policy, our latest research on advances in robot control and planning using Cosmos WFM.

TL;DR

Cosmos Policy: A new state-of-the-art robot control policy to post-train the Cosmos Predict-2 world-based model for manipulation tasks. Encode robot actions and future states directly into the model to achieve SOTA performance on LIBERO and RoboCasa benchmarks.

📦 Model of HuggingFace | 🔧 Code on GitHub

Cosmos Cookoff: An open hackathon where developers can get hands-on with the underlying model of the Cosmos world and push the limits of physics AI.

🍳 Join the Cosmos Cook-off

Overview: What is Cosmos Policy?

Cosmos Policy is a robot control and planning policy obtained by fine-tuning Cosmos Predict, a world-based model trained to predict future frames. Instead of introducing new architectural components or separate action modules, Cosmos Policy directly adapts pre-trained models through a single stage of post-training on robot demonstration data.

Policies are the decision-making brains of a system that map observations (such as camera images) to physical actions (such as moving a robot arm) to complete a task.

What’s the difference?

What’s innovative about Cosmos Policy is how it represents this data. Instead of building separate neural networks for robot recognition and control, we treat the robot’s movements, physical states, and success scores like frames of a video.

These are all encoded as additional latent frames. These are learned using the same diffusion process as video generation, allowing the model to inherit a pre-learned understanding of physics, gravity, and how the scene evolves over time.

Latent refers to the compressed mathematical language that the model uses to understand the data internally (rather than the raw pixels).

As a result, a single model can:

Predict action chunks to guide robot movements using hand-eye coordination (i.e., visual motor control) Predict future robot observations for world modeling Predict expected returns (i.e., value functions) for planning

All three features are learned jointly within one unified model.

Cosmos Policy can be deployed as a direct policy, where only actions are generated during inference, or as a planning policy, where multiple candidate actions are evaluated by predicting the resulting future states and values.

Basic model: Cosmos Predict and why it matters

Recent research in robot manipulation increasingly relies on large-scale pre-trained backbones to improve generalization and data efficiency. Most of these approaches are built on vision language models (VLMs) that are trained on large-scale image and text datasets and fine-tuned to predict robot behavior.

These models learn to understand videos and describe what they see, but they do not learn how to physically perform actions. VLM can suggest high-level actions like turning left or picking up a purple cup, but it doesn’t know how to execute them exactly.

In contrast, WFM is trained to predict how a scene changes over time and generate temporal dynamics in a video. These functions are directly related to robot control, and operations must take into account how the environment and the robot’s own state change over time.

Cosmos Predict is trained for physical AI with a diffusion objective over continuous spatiotemporal potentials, allowing you to model complex, high-dimensional, multimodal distributions over long time horizons.

This design makes Cosmos Predict a natural basis for visuomotor control.

The model has already learned state transitions through future frame prediction. Its diffuse formulation supports multimodal outputs, which is important for tasks with multiple valid action sequences. Transformer-based denoisers can be extended to long sequences and multiple modalities.

Cosmos Policy is built on Cosmos Predict2 after training and uses the model’s native diffusion process to generate robot actions in line with future observations and value estimates. This allows policies to fully inherit a pre-trained model’s understanding of temporal structure and physical interactions while remaining easy to train and deploy.

⚡IMPORTANT UPDATE: The latest Cosmos Predict 2.5 is now available. Please check the model card.

Summary of results

Cosmos Policy is evaluated across simulation benchmarks and real-world robot manipulation tasks, and compared to a diffusion-based policy trained from scratch, a video-based robot policy, and a fine-tuned Vision Language Action (VLA) model.

Cosmos Policy is evaluated on two standard benchmarks for multitasking and long-term robot operations: LIBERO and RoboCasa.

On LIBERO, Cosmos Policy consistently outperforms previous spread policies and VLA-based approaches across a suite of tasks, especially for tasks that require precise temporal coordination and multi-step execution.

Model Spatial SR (%) Object SR (%) Target SR (%) Long-term SR (%) Average SR (%) Diffusion policy 78.3 92.5 68.3 50.5 72.4 Dita 97.4 94.8 93.2 83.6 92.3 π0 96.8 98.8 95.8 85.2 94.2 UVA — — — 90.0 — UniVLA 96.5 96.8 95.6 92.0 95.2 π0.5 98.8 98.2 98.0 92.4 96.9 Video Policy — — — 94.0 — OpenVLA-OFT 97.6 98.4 97.9 94.5 97.1 CogVLA 98.6 98.8 96.6 95.4 97.4 Cosmos Policy (our company) 98.1 100.0 98.2 97.6 98.5

On RoboCasa, Cosmos Policy achieves a higher success rate than a baseline trained from scratch, demonstrating improved generalization across a variety of home operational scenarios.

Model Number Training per Task Demo Average SR (%) GR00T-N1 300 49.6 UVA 50 50.0 DP-VLA 3000 57.3 GR00T-N1 + DreamGen 300 (+10000 Synthesis) 57.6 GR00T-N1 + DUST 300 58.5 UWM 1000 60.8 π0 300 62.5 GR00T-N1.5 300 64.1 Video policy 300 66.0 FLARE 300 66.4 GR00T-N1.5 + HAMLET 300 66.4 Cosmos policy (our company) 50 67.1

For both benchmarks, initializing from Cosmos Predict significantly improves performance over training an equivalent architecture without video pretraining.

Direct execution of plans and policies

When deployed directly as a policy, Cosmos Policy already matches or exceeds state-of-the-art performance for most tasks.
We observed that enhanced model-based planning improved task completion rates by an average of 12.5% ​​on two difficult real-world manipulation tasks.

real world manipulation

Cosmos Policy is also evaluated on real-world two-handed tasks using the ALOHA robotic platform.
This policy successfully performs long-term operational tasks directly from visual observation.

Learn more about the architecture and results here.

What’s next: Cosmos Cook-off

Cosmos Policy represents an early step in adapting the global infrastructure model to robot control and planning. We are actively collaborating with early adopters to advance this research for the robotics community.

At the same time, developers will continue to benefit from practical Cosmos Cookbook recipes that show them how to adopt and build on Cosmos Policy.

To support hands-on experimentation with Cosmos WFM, we are announcing Cosmos Cookoff, an open hackathon focused on building applications and workflows using Cosmos models and cookbook recipes. Our latest Cookoff is live and invites physical AI developers across robotics, self-driving cars, and video analytics to explore, quickly prototype, and learn with experts.

🍳 Join the Cosmos Cook-off

📅 Dates: January 29th – February 26th 👥 Team format: Up to 4 member teams 🏆 Prizes: $5,000 in prizes, NVIDIA DGX Spark™, NVIDIA GeForce RTX™ 5090 GPUs and more! 🧑‍⚖️ Judges: Projects include Datature, Hugging Judged by experts from Face, Nebius, Nexar, and NVIDIA, who bring deep experience in open models, cloud/compute, and real-world edge and vision AI deployments.

📣 Let’s get started

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI agent for code security — Google DeepMind
versatileai

Related Posts

Tools

AI agent for code security — Google DeepMind

January 29, 2026
Tools

Deloitte’s agent AI guide highlights governance

January 28, 2026
Tools

Co-design data for sovereign AI

January 28, 2026
Add A Comment

Comments are closed.

Top Posts

Wall Street is pleased with Microsoft as it spends $100 billion on AI. Microsoft

July 30, 202510 Views

CIO’s Governance Guide

January 22, 20268 Views

Next-generation smart CCTV

May 30, 20257 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Wall Street is pleased with Microsoft as it spends $100 billion on AI. Microsoft

July 30, 202510 Views

CIO’s Governance Guide

January 22, 20268 Views

Next-generation smart CCTV

May 30, 20257 Views
Don't Miss

Introducing NVIDIA Cosmos Policies for Advanced Robot Control

January 29, 2026

AI agent for code security — Google DeepMind

January 29, 2026

Artificial intelligence (AI) transformation (AX) and generative AI are attracting attention these days.

January 29, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?