Model Release

Nvidia Cosmos 3 Launches Open Physical AI Omnimodel

Nvidia Cosmos 3 launches as the first fully open physical AI omnimodel, generating video and robot actions to cut robotics training from months to days.

Share:XLinkedIn

Key Takeaways

  • Cosmos 3 is the first fully open omnimodel, generating text, image, video, ambient sound, and robot action in one model.
  • It ships as a 16B Nano and a 64B Super, each pairing a reasoning tower with a diffusion generation tower, with a 2B Edge variant coming.
  • Nvidia claims Cosmos 3 cuts physical AI training and evaluation cycles from months to days using synthetic data.
  • Open weights seed demand for Nvidia GPUs, the same flywheel CUDA ran two decades ago.
  • The Cosmos Coalition recruits Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to lock in the open standard.

Nvidia just gave away the brain it spent years building. Cosmos 3, launched at GTC Taipei, is the first fully open foundation model that can reason about the physical world and generate the video and robot actions to act in it. The company that sells the most expensive AI chips on earth decided the model layer should be free.

What Actually Happened

On June 1, 2026, Nvidia released Cosmos 3, which it calls the first fully open omnimodel for physical AI. Unlike a language model that only handles text, Cosmos 3 natively understands and generates across text, image, video, ambient sound, and action. It is built to produce synthetic training data and policy models for robots and autonomous vehicles, the two markets Nvidia believes will define the next decade of compute demand. The weights are open, the benchmarks are public, and the licensing is permissive enough that a startup can build a commercial product on top without negotiating a license.

The architecture is the interesting part. Cosmos 3 uses a mixture-of-transformers design that pairs a Reasoner tower, a vision-language model that interprets observations autoregressively to understand motion, object interactions, and spatial-temporal relationships, with a Generator tower that uses a diffusion process to produce physics-aware video and action trajectories. The model thinks about what will happen before it renders what happens, which is closer to how a human anticipates a falling glass than how a video model hallucinates pixels. That two-stage design is what lets it output not just convincing footage but the actual control signals a robot needs to reproduce the motion.

Two variants shipped immediately. Cosmos 3 Nano runs at 16 billion parameters, split as an 8B reasoner and an 8B generator, tuned for high-quality video and action reasoning in fractions of a second. Cosmos 3 Super runs at 64 billion parameters, 32B plus 32B, built for post-training robotics and AV models that demand the highest physics accuracy. A 2B Cosmos 3 Edge variant for real-time on-device inference is coming. Nvidia claims the family reduces physical AI training and evaluation cycles from months to days, and ranks first among open models across the TAR leaderboards for vision understanding.

Stay Ahead

Get daily AI signals before the market moves.

Join founders, investors, and operators reading TechFastForward.

The split between a Nano, a Super, and a forthcoming Edge tier is itself a statement of intent. Nvidia is not shipping one research artifact, it is shipping a product ladder that spans cloud training runs, real-time policy generation, and on-device inference at the robot itself. The 64B Super model is meant to live in a data center generating high-fidelity synthetic worlds, while the 2B Edge model is meant to ride inside a moving machine making decisions in milliseconds. By covering the full range in a single open family, Nvidia ensures a team can prototype on Super and deploy on Edge without ever leaving the Cosmos ecosystem, which is exactly how lock-in begins. The naming deliberately echoes the company's own GPU tiers, mapping how Nvidia wants the robotics stack to mirror the one it already dominates.

Why This Matters More Than People Think

The bottleneck in robotics has never been hardware. It has been data. A language model can train on the entire internet, but a robot arm cannot read its way to dexterity, and collecting real-world manipulation data costs millions of dollars and thousands of hours per task. Cosmos 3 attacks that wall by generating physically plausible synthetic data at scale, letting a robotics team simulate a warehouse, a kitchen, or a factory line and train a policy without bending a single piece of real metal. If the generated physics are accurate enough, the cost curve of robotics development bends downward by an order of magnitude.

That is why Nvidia is willing to open-source it. The company makes its money on GPUs, not model licenses, and every team that adopts Cosmos 3 becomes a team that buys Nvidia silicon to train and run it. By giving away the model, Nvidia seeds demand for the hardware underneath it, the same flywheel it ran with CUDA two decades ago. Open weights here are not charity, they are a distribution strategy that turns the entire robotics industry into a captive customer base for accelerated computing. The more the field standardizes on Cosmos, the more every incremental robot becomes an incremental GPU sale.

There is a second-order effect that matters for the labor market. If synthetic data collapses the cost of training robot policies, the gating factor for deploying humanoids and autonomous machines shifts from research budgets to deployment logistics. The robots get cheaper to teach, which means more of them get taught, which means physical automation arrives faster in the warehouses, plants, and logistics hubs that employ tens of millions of people. The model is open, but its consequences are not evenly distributed across the workforce that physical automation will eventually touch.

There is also a capital-markets dimension that gets overlooked. Robotics startups have raised billions on the premise that proprietary data and proprietary models are the durable moat. Cosmos 3 quietly resets that thesis. If the foundation model is free and the synthetic data engine is open, then the differentiation moves to hardware design, deployment execution, and the proprietary fine-tuning data a company collects from real installations. Investors who underwrote closed-model robotics bets now have to ask whether the asset they funded just lost its scarcity, and founders who pitched a model moat have to find a new story before their next round.

The Competitive Landscape

Nvidia is not alone in chasing world models. Google DeepMind has Genie and its Veo video stack, Meta has poured research into V-JEPA and its predictive architectures, Tesla builds proprietary world models to feed Full Self-Driving, Wayve has GAIA for autonomous driving, and Fei-Fei Li's World Labs is building spatial intelligence from scratch. Each of these is, for now, either closed, narrow, or tied to a single application. Nvidia's bet is that being open and general beats being proprietary and specialized, the same wager that let Android out-ship a more polished iOS across the global device market.

To lock in that position, Nvidia launched the Cosmos Coalition alongside the model, recruiting Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to build on and extend the open world model. This is the playbook of an incumbent that wants to set the standard before a rival does. Whoever owns the default world model for robotics owns the data format, the tooling, and the developer mindshare, and Nvidia is moving to claim all three before Google's vertically integrated stack can answer. A coalition of named robotics and generative-media labs also lends the release credibility that a solo launch would lack.

The historical parallel worth holding in mind is the ImageNet moment of 2012, when a single open benchmark and a winning architecture pulled an entire field forward almost overnight. Open robotics foundation models could do the same for physical AI, compressing a decade of fragmented, lab-by-lab progress into a few years of shared, cumulative improvement. The difference is that this time the company supplying the open model also supplies the picks and shovels, which gives Nvidia a degree of leverage the academic labs of 2012 never had. The 2012 breakthrough enriched a research community, this one is engineered to enrich a single hardware vendor.

Hidden Insight: Nvidia Is Commoditizing the Layer Above Its Moat

The non-obvious move here is that Nvidia is deliberately commoditizing the model layer to protect the layer it actually owns. Every dollar that flows into proprietary robot foundation models is a dollar that could eventually fund a competitor to Nvidia's hardware position. By making the best physical AI model free, Nvidia removes the incentive for robotics companies to build their own closed models, and in doing so it removes the path by which any of them could accumulate the kind of value that threatens the chipmaker. You do not build a moat around the castle you are giving away, you build it around the ground everyone has to stand on to use it.

This is the inverse of the strategy OpenAI and Anthropic pursue, where the model is the product and the moat. Nvidia treats the model as a loss leader and the silicon as the product, which means it can afford to out-spend any pure-play model lab on open robotics research indefinitely. A startup that raised 400 million dollars to build a closed world model now has to compete against a free model from a company with a market capitalization north of four trillion dollars and no need to ever charge for the weights. That is not a fair fight, and it was never meant to be. The commoditize-your-complement strategy is old, but rarely has a company had the balance sheet to run it this aggressively.

The bear case, however, is straightforward and worth taking seriously. Synthetic data is only as useful as its fidelity to reality, and the sim-to-real gap has humbled every robotics company that bet too hard on simulation. A diffusion model can generate video that looks physically plausible to a human eye while encoding subtle errors in friction, mass, or contact dynamics that cause a trained policy to fail the moment it touches a real object. Critics argue that TAR leaderboard rankings for vision understanding say very little about whether a robot trained on Cosmos 3 data will actually pick up a deformable bag of groceries without crushing it. If the physics do not transfer, the entire value proposition softens into a very good video generator with a robotics marketing label.

There is also a strategic risk the market is underpricing. By open-sourcing the standard, Nvidia invites the whole industry to study, fork, and eventually improve on its architecture, including competitors who could use the open weights as a launchpad to build something better and tie it to non-Nvidia silicon. Google, with its own TPUs and its own world-model research, has every reason to take what is open, optimize it for its hardware, and route the resulting demand away from Nvidia. Openness cuts both ways, and the company that sets the standard does not always get to keep it. The same move that builds Nvidia's flywheel hands its sharpest competitor a free starting point.

The deepest insight is that Cosmos 3 reframes what a foundation model is for. In language, the model is the destination, the thing users pay to access. In physical AI, Nvidia is betting the model is merely the on-ramp to a much larger market in robots, sensors, and the compute that powers them. If that framing wins, the next generation of robotics value will not sit in the model weights at all, it will sit in the deployment data, the hardware integration, and the silicon, which is precisely the territory Nvidia has spent a decade fortifying. The company is not selling intelligence, it is selling the infrastructure intelligence runs on, and giving away the intelligence is how it sells more infrastructure.

What to Watch Next

In the next 30 days, watch the download and fine-tuning numbers and which robotics teams publicly commit to Cosmos 3 over their in-house models. Adoption by a marquee humanoid company or a major autonomous-vehicle program would signal that the synthetic-data pitch is landing where it counts. Also watch whether the Cosmos Coalition members ship anything concrete, because a coalition that produces press releases but no models is a marketing exercise, not a standard worth tracking.

Over 90 days, the metric that matters is sim-to-real transfer evidence. Look for papers, demos, or deployment reports showing that a policy trained primarily on Cosmos 3 synthetic data performs in the physical world, not just on a benchmark. The release of the 2B Edge variant will also tell us how serious Nvidia is about on-device inference, the segment where real robots actually run. If Edge ships on schedule and runs on Jetson-class hardware, the flywheel from open model to hardware sales starts spinning in public view.

Over 180 days, the question is whether a credible challenger emerges, most likely from Google DeepMind or a well-funded startup, that can match Cosmos 3 on openness while beating it on physics fidelity. If no one does, Nvidia will have quietly annexed the foundation layer of an entire industry. If someone does, we will learn whether open-sourcing the model was a masterstroke or a gift to the very competitors best positioned to use it. Either way, the price of teaching a machine to move through the world just fell, and that change is permanent.

Nvidia did not give away its robot brain out of generosity, it gave it away so the whole world would have to rent its hardware to use it.


Key Takeaways

  • First fully open omnimodel Cosmos 3 generates text, image, video, ambient sound, and robot action in one model.
  • 16B Nano and 64B Super ship now, each split into a reasoning tower and a diffusion generation tower, with a 2B Edge variant coming.
  • Months to days is the cycle-time reduction Nvidia claims for physical AI training and evaluation using synthetic data.
  • Free model, paid silicon open weights seed demand for Nvidia GPUs the same way CUDA did two decades ago.
  • The Cosmos Coalition recruits Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to lock in the standard.

Questions Worth Asking

  1. If the best physical AI model is now free, where does the durable value in robotics actually accumulate, and who captures it?
  2. Does synthetic data from a diffusion model close the sim-to-real gap, or just hide it behind convincing video?
  3. If teaching a robot a task drops from months to days, which jobs in your industry stop being safe from automation in the next three years?
Newsletter

Enjoyed this analysis? Get the next one in your inbox.

Daily AI signals. No noise. Built for founders, investors, and operators.

Share:XLinkedIn
</> Embed this article

Copy the iframe code below to embed on your site:

<iframe src="https://techfastforward.com/embed/nvidia-cosmos-3-launches-open-physical-ai-omnimodel" width="480" height="260" frameborder="0" style="border-radius:16px;max-width:100%;" loading="lazy"></iframe>