Nvidia Unveils Vera Rubin: The Next-Gen AI Supercomputer with 10x Efficiency Over Blackwell

Nvidia has officially unveiled its next-generation AI computing platform, Vera Rubin, marking a massive leap forward in performance and energy efficiency. CEO Jensen Huang confirmed the system is now in full production and will ship to partners in the second half of 2026.

Key Highlights

10x lower inference token cost compared to Blackwell
4x fewer GPUs needed to train mixture-of-experts (MoE) models
72 Rubin GPUs and 36 Vera CPUs in a single NVL72 rack
260 TB/s total bandwidth with NVLink 6 interconnect
20.7 TB of HBM4 and 54 TB of LPDDR5x memory

Six Chips, One Platform

The Rubin platform is built around six new chips working in concert. At its core is the Rubin GPU with a third-generation Transformer Engine and adaptive compression, delivering 50 petaflops of NVFP4 compute for AI inference. Paired with it is the Vera CPU, featuring 88 custom Olympus cores based on Armv9.2 architecture.

The remaining four chips handle connectivity and infrastructure: NVLink 6 Switch for sixth-generation interconnect with in-network compute, ConnectX-9 SuperNIC for advanced networking, BlueField-4 DPU optimized for agentic AI reasoning workloads, and the Spectrum-6 Ethernet Switch for AI factory networking.

The NVL72 Rack

The flagship configuration, Vera Rubin NVL72, packs 1.3 million components into a single rack. It delivers 3.6 exaflops of NVFP4 inference and 2.5 exaflops of training compute. A cable-free tray design makes assembly and servicing 18x faster than Blackwell systems.

Nvidia also introduced Spectrum-X Photonics, offering 5x better power efficiency and 10x greater reliability for data center networking. The platform includes third-generation Confidential Computing — the first rack-scale implementation of its kind.

Ecosystem and Availability

Major cloud providers including AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave, and Lambda are among the first to deploy Vera Rubin. AI labs such as OpenAI, Anthropic, xAI, Mistral, Cohere, and Perplexity have committed to the platform.

Hardware partners Dell, HPE, Lenovo, Supermicro, and Cisco will offer Vera Rubin-based systems, while Red Hat is providing Enterprise Linux and OpenShift integration. A smaller HGX Rubin NVL8 configuration with 8 GPUs is also available for x86 server platforms.

Why It Matters

"Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Jensen Huang. With estimated rack pricing between $3.5M and $4M, Vera Rubin targets hyperscalers and sovereign AI deployments that need maximum compute density with dramatically lower per-token costs.

The platform addresses the biggest bottleneck in AI infrastructure — energy efficiency — making large-scale AI deployment roughly 10x more affordable per inference token. For companies training frontier models, the 4x reduction in required GPUs translates to billions in savings.

What's Next

Nvidia's GTC conference, scheduled for March 16-19, 2026, is expected to provide deeper technical details and live demonstrations of Vera Rubin workloads. With production already underway, the first customer deployments should begin arriving by mid-2026.

Source: NVIDIA Newsroom