2026-06-01 · HuggingFace

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

researchinfrastructure

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

Source: HuggingFace Date: 2026-06-01 URL: https://huggingface.co/blog/nvidia/cosmos-3-for-physical-ai

Summary

NVIDIA released Cosmos 3, an open-weights omni-model for physical AI that unifies world generation, physical reasoning, and action generation in one Mixture-of-Transformers model (autoregressive tokens for reasoning + diffusion tokens for generation, interacting through joint attention). It collapses the prior Cosmos line (Predict/Transfer/Reason/Policy) into a single model handling text/image/video/audio/action across five task modes — video model, VLM, forward and inverse dynamics, and policy. Two sizes ship under an open license on HuggingFace: Cosmos 3 Nano (8B, runs on a single RTX PRO 6000) and Cosmos 3 Super (32B, Hopper/Blackwell), with Diffusers integration, six synthetic robotics/driving/warehouse datasets, and NIM microservices. Target domains: robotics manipulation, autonomous-vehicle long-tail scenarios, warehouse safety simulation.

Implications

Feeds local models and AI ecosystem/power dynamics — and is a direct disconfirmer to the “open weights inherit the safe tier” read.

Open-weight ecosystem widens beyond Google/Alibaba/Zhipu. The “end of open Llama” worry (Meta Muse Spark going proprietary) framed the open frontier as dependent on three vendors plus community. NVIDIA shipping an open omni-model in a high-stakes domain (physical AI) adds a fourth major contributor — one whose incentive structure is structurally pro-open.
The gate is domain-conditional, not universal. Cyber (Mythos/Glasswing, GPT-5.5-Cyber) and bio (GPT-Rosalind) get gated behind vetting because they’re catastrophic-misuse domains. Physical AI does not — because NVIDIA’s business is selling the Blackwell/Hopper silicon that runs Cosmos. Open Cosmos drives GPU demand; gating it would defeat the purpose. The determinant of open-vs-gated is whether the capability is a misuse risk (gated) or an adoption driver for the opener’s actual business (opened).
Local-inference fit: Nano at 8B is the consumer-reachable tier; the RTX PRO 6000 target puts it above the M3 Max’s range, but the 8B class is the right size to watch for quantized community ports. Physical-AI/robotics is adjacent to, not central to, the coding-agent stack — tracked as ecosystem signal, not active-use.

← all signals