Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

“AI Doctor, am I healthy?” 59% of Brits rely on AI for self-diagnosis

January 9, 2026

NVIDIA Cosmos Reason 2 brings advanced inference to physical AI

January 8, 2026

Pictory AI Text-to-Video Generator: Simplify video creation with automatic AI tools | AI News Details

January 8, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, January 9
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»NVIDIA Cosmos Reason 2 brings advanced inference to physical AI
Tools

NVIDIA Cosmos Reason 2 brings advanced inference to physical AI

versatileaiBy versatileaiJanuary 8, 2026No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email



Devraj Sinha Avatar

NVIDIA today released Cosmos Reason 2, the latest advancement in open inference vision language models for physical AI. Cosmos Reason 2 outperforms previous versions in accuracy and tops the Physical AI Bench and Physical Reasoning leaderboards as the #1 open model for visual understanding.

NVIDIA Cosmos Reason 2: Physical AI Vision Language Model Inference

Since their introduction, visual language models have rapidly improved in tasks such as object and pattern recognition in images. But they still struggle with tasks that humans consider natural, such as planning several steps ahead, dealing with uncertainty, and adapting to new situations. Cosmos Reason is designed to bridge this gap by giving robots and AI agents stronger common sense and reasoning to solve complex problems step by step.

Cosmos Reason 2 is a state-of-the-art open reasoning vision language model (VLM) that enables robots and AI agents to see, understand, plan, and act on the physical world as humans do. Use common sense, physics, and prior knowledge to recognize how objects move through space and time to handle complex tasks, adapt to new situations, and find ways to solve problems step by step.

✨ Key highlights

Improved spatio-temporal understanding and timestamp accuracy.

Optimize performance with flexible edge-to-cloud deployment options with 2B and 8B parameter model sizes.

Support for an expanded set of spatial understanding and visual recognition features – 2D/3D point localization, bounding box coordinates, trajectory data, OCR support.

256K input tokens, up from 16K in Cosmos Reason 1, for better understanding of long contexts.

Easy-to-use Cosmos Cookbook recipes adapt to multiple use cases.

🤖 Common use cases

Video Analytics AI Agents – These agents can extract valuable insights from large amounts of video data to optimize processes. Cosmos Reason 2 builds on the features of Cosmos Reason 1 and now offers OCR support, 2D/3D point localization, and a suite of mark understanding.

An example of how Cosmos Reason understands text embedded within a video to determine road conditions during a rainstorm.

Developers can quickly start developing video analytics AI agents using the NVIDIA Blueprint for Video Search and Summarization (VSS) that uses Cosmos Reason as a VLM.

Salesforce is transforming workplace safety and compliance by analyzing video footage captured by Cobalt robots with a VSS blueprint that uses Agentforce and Cosmos Reason as the VLM.

Data annotation and critique — Enables developers to automate high-quality annotation and critique of large and diverse training datasets. Cosmos Reason provides time stamps and detailed descriptions of real or synthetically generated training videos.

Examples of data annotation and critique
An example of a sample prompt that generates detailed time-stamped captions for a race car video.

Uber is exploring Cosmos Reason 2 to provide accurate, searchable video captions for self-driving vehicle (AV) training data to help efficiently identify critical driving scenarios. This co-authored recipe for AV video captioning and VQA shows you how to fine-tune and evaluate Cosmos Reason 2-8B with annotated AV video. Visible improvements were achieved across multiple metrics. BLEU score improved by 10.6% (0.113 → 0.125), MCQ-based VQA improved by 0.67 percentage points (80.18% → 80.85%), and LingoQA improved by 13.8% (63.2% → 77.0%). These advantages indicate an effective domain adaptation for AV applications.

Robot planning and reasoning — Serve as the brain for purposeful and systematic decision-making with the robot’s vision-language-action (VLA) model. In addition to determining the next step, Cosmos Reason 2 now also provides orbital coordinates.

Example prompts and JSON output from Cosmos Reason 2. The robot gripper provides the necessary steps and trajectories to move the painter’s tape into the basket.

Encord provides native support for Cosmos Reason 2 in its Data Agent library and AI data platform, allowing developers to leverage Cosmos Reason 2 as a VLA for robotics and other physical AI use cases.

Companies like Hitachi, Milestone, and VAST Data are using Cosmos Reason to power robotics, autonomous driving, and video analytics AI agents for transportation and workplace safety.

Try Cosmos Reason 2 at build.nvidia.com and experience the latest features including sample prompts for generating bounding boxes and robot trajectories. Upload your own videos and images for further analysis.

Download Cosmos Reason 2 models (2B and 8B) with Hugging Face or use Cosmos Reason 2 in the cloud. This model will be available soon on Amazon Web Services, Google Cloud, and Microsoft Azure. First, check out the Cosmos Reason 2 documentation and Cosmos Cookbook.

Other models in the Cosmos family:

🔮 Cosmos Predict 2.5

Cosmos Predict is a generative AI model that predicts the future state of the physical world as a video based on text, image, or video input.

The leader in physics AI benches for quality, accuracy, and overall consistency. Up to 30 seconds of physically and temporally consistent clips per generation. Supports multiple frame rates and resolutions. Pre-trained with 200 million clips. Available as 2B and 14B pre-training models and various 2B post-training models for multi-view, action conditioning, and autonomous vehicle training.

Check model card >>

🔁 Cosmos Transfer 2.5

Cosmos Transfer is our lightest multi-control model built for video to world style transfer.

Scale a single simulation or spatial video across different environments and lighting conditions. Immediate compliance and physical coordination improved. Use with NVIDIA Isaac Sim™ or NVIDIA Omniverse NuRec for simulation to actual transformations.

Check model card >>

🤖 NVIDIA GR00T N1.6

NVIDIA GR00T N1.6 is an open reasoning Vision Language Action (VLA) model built specifically for humanoid robots to unlock whole-body control and improve reasoning and context understanding with NVIDIA Cosmos Reason.

resource

🧑🏻‍🍳 Read Cosmos Cookbook → https://nvda.ws/4qevli8

📚 Explore models and datasets → https://github.com/nvidia-cosmos

⬇️ Try Cosmos models with a hosted catalog → https://nvda.ws/3Yg0Dcx

💻 Join the Cosmos community → https://discord.gg/u23rXTHSC9

🗳️ Contribute to Cosmos Cookbook → https://nvda.ws/4aQcBkk

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticlePictory AI Text-to-Video Generator: Simplify video creation with automatic AI tools | AI News Details
Next Article “AI Doctor, am I healthy?” 59% of Brits rely on AI for self-diagnosis
versatileai

Related Posts

Tools

“AI Doctor, am I healthy?” 59% of Brits rely on AI for self-diagnosis

January 9, 2026
Tools

Gemini 2.5 Flash-Lite is now stable and generally available

January 8, 2026
Tools

Agentic AI scaling requires new memory architecture

January 7, 2026
Add A Comment

Comments are closed.

Top Posts

L’Oréal brings AI to everyday digital advertising production

January 6, 20265 Views

Solana’s fast AI benefits and malware losses

January 4, 20265 Views

Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing

July 11, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

L’Oréal brings AI to everyday digital advertising production

January 6, 20265 Views

Solana’s fast AI benefits and malware losses

January 4, 20265 Views

Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing

July 11, 20255 Views
Don't Miss

“AI Doctor, am I healthy?” 59% of Brits rely on AI for self-diagnosis

January 9, 2026

NVIDIA Cosmos Reason 2 brings advanced inference to physical AI

January 8, 2026

Pictory AI Text-to-Video Generator: Simplify video creation with automatic AI tools | AI News Details

January 8, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?