News Summary: NVIDIA Cosmos 3 is a new leaderboard-topping open physical AI foundation model, built on a breakthrough mixture-of-transformers architecture for physical AI reasoning, world simulation and action generation. Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning and multimodal generation across text, image, video, ambient sound and action for state-of-the-art synthetic data generation and physical AI policy model development. NVIDIA launches the NVIDIA Cosmos Coalition with leading AI labs and robotics leaders β including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI β to advance the next generation of open world models. TAIPEI, Taiwan, June 01, 2026 (GLOBE NEWSWIRE) — NVIDIA GTC Taipei — NVIDIA today launched NVIDIA CosmosTM 3 , an open world foundation model for physical AI built on a breakthrough mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction in a single system. Cosmos 3 is the world’s first fully open omnimodel that can natively understand and generate text, images, video, ambient sound and actions with leading physics accuracy, reducing physical AI training and evaluation cycles from months to days. NVIDIA also launched the NVIDIA Cosmos Coalition, a global collaboration between world model builders and AI developers β including Agile Robots , Black Forest Labs, Generalist, LTX, Runway and Skild AI β working together to advance next-generation world models. “The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models,” said Jensen Huang, founder and CEO of NVIDIA. “The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI that perceive, reason, plan and act in the physical world.” A New Architecture for Physical AI Cosmos 3 tackles a fundamental challenge in physical AI: enabling robots, autonomous vehicles (AVs) or vision agents to generalize in the real world with limited training data and fragmented simulation stacks. The model’s mixture-of-transformers architecture pairs a reasoning transformer with an expert generation transformer, enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories. Trained on one of the largest multimodal physical AI datasets β including billions of samples across text, image, video, sound and action trajectories β the model gives developers a powerful pretrained foundation for building physical AI systems with less data and lower training costs. Developers can use Cosmos 3 as: A vision language model that understands and reasons across modalities. A world model or video foundation model that simulates physical environments and predicts future world states for training and evaluation. The backbone for Full story available on Benzinga.com
NVIDIA Launches Cosmos 3, the Open Frontier Foundation Model for Physical AI
Source: Benzinga
Read Full Story β
