NVIDIA GTC 2026: Rubin CPX, Feynman at 1.6nm, and the $20B Groq Gambit
NVIDIA GTC 2026 Preview: Jensen Huang's Biggest Keynote Yet
On March 16, Jensen Huang will take the stage at the SAP Center in San Jose for a two-hour keynote that promises to reshape the AI infrastructure landscape. Speaking at a Korean chicken restaurant in Santa Clara on February 14 — after dinner with 30 NVIDIA and SK Hynix engineers — Huang told Korea Economic Daily: "We've prepared a few chips the world has never seen before." With 30,000+ attendees expected and over 1,000 sessions planned across March 16-19, GTC 2026 marks NVIDIA's most ambitious conference as the company unveils a four-generation chip roadmap stretching from Blackwell through Feynman.
The stakes are enormous. NVIDIA reported $68.1 billion in Q4 2025 revenue (73% year-over-year growth), with data center revenue hitting $62.3 billion. Yet shares fell 7.7% post-earnings as investors questioned whether the company can maintain growth. GTC 2026 is Huang's opportunity to answer with silicon: Rubin for training, Rubin CPX for inference, LPX for ultra-low latency, and the first glimpse of Feynman — potentially the world's first 1.6nm chip.
Rubin Platform: Six Chips Entering Production H2 2026
The Rubin platform, first announced at CES 2026, is now in full production with six purpose-built chips forming an integrated AI supercomputer stack:
The NVIDIA Vera CPU features 88 custom Olympus cores with full Armv9.2 architecture and NVLink-C2C connectivity. The Rubin GPU delivers 50 PFLOPS of NVFP4 inference and 35 PFLOPS NVFP4 training — 5x and 3.5x improvements over Blackwell respectively. Supporting chips include the NVLink 6 Switch with in-network compute, ConnectX-9 SuperNIC for networking, BlueField-4 DPU with ASTRA security, and Spectrum-6 Ethernet Switch with co-packaged optics.
The flagship Vera Rubin NVL72 Rack houses 72 Rubin GPUs and 36 Vera CPUs with 3.6 TB/s NVLink bandwidth per GPU and 260 TB/s total rack bandwidth. A cable-free tray design enables 18x faster assembly than Blackwell systems. Major cloud providers including AWS, Google Cloud, Microsoft, Oracle, CoreWeave, Lambda, Nebius, and Nscale have committed to H2 2026 deployments.
Industry leaders are already aligning their strategies around Rubin. Sam Altman of OpenAI stated that "intelligence scales with compute... Rubin helps us keep scaling," while Anthropic CEO Dario Amodei noted the "efficiency gains enable longer memory, better reasoning." Microsoft CEO Satya Nadella announced plans to deploy "hundreds of thousands" of Vera Rubin Superchips in Fairwater AI superfactory sites, calling them the "most powerful AI superfactories" ever built.
Rubin CPX: A New GPU Class for Million-Token Inference
Perhaps the most strategically significant announcement is Rubin CPX — an entirely new class of GPU designed specifically for massive-context inference workloads. While standard Rubin uses expensive HBM4 memory optimized for training throughput, CPX pairs 30 PFLOPS of NVFP4 compute with 128GB of cost-efficient GDDR7 memory, delivering 3x faster attention computation than GB300 NVL72.
The Vera Rubin NVL144 CPX rack configuration packs 8 EXAFLOPS of compute, 100TB of fast memory, and 1.7 PB/s bandwidth — 7.5x more than GB300 NVL72. NVIDIA projects a 30-50x return on investment, claiming $100 million in capital expenditure can generate up to $5 billion in inference revenue. CPX is expected to ship by end of 2026.
The endorsements reveal where the inference market is heading. Cursor CEO Michael Truell said "NVIDIA Rubin CPX will transform software creation," while Magic CEO Eric Steinberger highlighted the ability to support "100-million-token context windows" for comprehensive codebase understanding. Runway CEO Cristobal Valenzuela pointed to CPX enabling "longer context and more flexible, agent-driven creative workflows" for AI video generation.
The $20 Billion Groq Gambit: LPX Inference Architecture
NVIDIA's most unconventional move is integrating Groq's Language Processing Unit (LPU) technology — acquired through a $20 billion licensing agreement in late 2025. The resulting LPX architecture represents a fundamentally different approach to AI inference: instead of using traditional GPU parallel compute with external memory, LPX chips embed hundreds of megabytes of on-chip SRAM to eliminate bandwidth bottlenecks entirely.
The key innovation is deterministic execution. Rather than scheduling computation at runtime like GPUs, the Groq compiler pre-schedules all computation and data movement at compile time, enabling predictable latency through plesiosynchronous clock synchronization. This means guaranteed response times — critical for real-time applications where latency variability is unacceptable.
At GTC 2026, NVIDIA is expected to showcase the enhanced LPX rack with 256 LPUs per rack (a 4x increase from the initial 64-LPU configuration), capable of generating 10,000 thought tokens in approximately two seconds. The hybrid architecture enables RealScale networks to bridge with NVLink, allowing Mixture-of-Experts models to use LPUs as gating networks while Rubin GPUs handle expert computation.
OpenAI has reportedly signed on as one of the largest LPX customers, purchasing dedicated inference capacity. This positions NVIDIA's inference strategy as a three-tier system: Rubin for high-throughput training and inference, CPX for long-context prefill, and LPX for ultra-low-latency single-batch workloads.
Feynman: The World's First 1.6nm Chip
The biggest "surprise" Huang hinted at is likely Feynman — NVIDIA's next-next-generation architecture expected to be the world's first chip fabricated on TSMC's A16 process (1.6nm class). Named after physicist Richard Feynman, the architecture introduces two breakthrough technologies:
NanoFlex transistors with backside power delivery (Super Power Rail) — By routing power connections underneath the transistor layer instead of alongside signal wires, TSMC's A16 process dramatically reduces interference and enables tighter transistor packing. This is a fundamental manufacturing breakthrough that every chipmaker has been racing toward.
Silicon photonics integration — Rather than electrical signaling between chips, Feynman is expected to use light-based data transfer. NVIDIA's $4 billion investment in photonics companies Lumentum ($2B) and Coherent ($2B), announced March 2, 2026, signals the infrastructure buildout for this transition. The Spectrum-X Photonics platform already supports 128 ports at 800Gb/s, with 512-port configurations reaching 400Tb/s total throughput.
Performance projections suggest 5-20x compute increase over Rubin, with production expected around 2028 and customer shipments in 2029-2030. While technical details will likely be limited at GTC 2026, expect Huang to reveal the first architectural diagrams and performance targets.
NVIDIA Dynamo: Open-Source Inference at Scale
On the software side, NVIDIA Dynamo — the open-source inference framework succeeding Triton Inference Server — is emerging as the standard for deploying reasoning models. Built in Rust for performance with Python extensibility, Dynamo achieves a 30x throughput increase when serving DeepSeek-R1 671B on GB200 NVL72 hardware.
The framework integrates with PyTorch, SGLang, TensorRT-LLM, and vLLM, with enterprise adoption already underway: Gcore has integrated Dynamo for one-click AI inference, and AWS offers native EKS integration. The GitHub repository under ai-dynamo/dynamo provides the complete source code. For developers building AI applications, tools like Cursor and Windsurf can leverage these inference optimizations through their backend infrastructure.
AMD's Open-Standard Challenge: Helios and MI450
NVIDIA faces a credible competitor in AMD's Helios rack architecture, built on Meta's Open Rack Wide standard. The MI450 Series GPU, fabricated on TSMC 2nm with CDNA 5 architecture, delivers 432GB HBM4 memory and 19.6 TB/s bandwidth per GPU. A Helios rack with 72 MI450 GPUs achieves 1.4 EXAFLOPS FP8 and 2.9 EXAFLOPS FP4 — competitive with Rubin on paper.
Oracle Cloud Infrastructure has committed to deploying 50,000 MI450 GPUs starting Q3 2026, and HPE has adopted the Helios rack architecture. AMD's open-standard approach — anyone can build compatible racks without proprietary NVLink — could resonate with enterprises wary of vendor lock-in. Broadcom is simultaneously pursuing custom silicon (ASICs) for hyperscalers, creating a three-way battle for AI infrastructure spending.
What's NOT at GTC 2026: Consumer GPUs
Gamers hoping for RTX 60 series announcements will be disappointed. Reports indicate NVIDIA will not release any new consumer GPUs in 2026, with the RTX 60 series (based on Rubin's GR20x architecture) likely debuting in H2 2027 or even 2028. When it arrives, the RTX 6090 is rumored to deliver 40%+ performance uplift over the RTX 5090 — but that's at least 18 months away.
Looking Ahead: The Inference Era Begins
GTC 2026 marks NVIDIA's definitive pivot from the training era to the inference era. With three distinct inference architectures (Rubin, CPX, LPX) addressing different latency and context-length requirements, NVIDIA is segmenting the market in a way no competitor can match. The Groq acquisition alone signals that even the dominant GPU maker believes the future of AI deployment requires fundamentally different silicon.
For the broader AI ecosystem, the implications are clear: inference costs are about to drop dramatically, context windows will expand to millions of tokens, and real-time AI applications with guaranteed latency will become feasible at scale. Whether you're building multi-agent systems, deploying MCP servers, or optimizing AI workflow automation, the hardware foundation is shifting beneath your feet — and GTC 2026 is where the new ground gets laid.
Key Takeaways
- ✓NVIDIA unveils four-generation chip roadmap: Blackwell → Rubin (H2 2026) → Rubin Ultra (2027) → Feynman (2028-2029)
- ✓Rubin CPX creates new GPU category for million-token inference with GDDR7, projecting 30-50x ROI from $100M CAPEX
- ✓$20B Groq LPU integration enables 256-LPU racks generating 10,000 thought tokens in ~2 seconds — OpenAI as anchor customer
- ✓Feynman targets TSMC A16 (1.6nm) with backside power delivery and silicon photonics — potentially 5-20x over Rubin
- ✓$4B invested in silicon photonics (Lumentum + Coherent) to replace electrical signaling with light at rack scale
- ✓AMD Helios with MI450 (TSMC 2nm, 432GB HBM4) and Oracle's 50K-GPU commitment create genuine competition
- ✓No consumer RTX 60 series until H2 2027 — GTC 2026 focuses entirely on data center AI infrastructure
Skila AI Editorial Team
The Skila AI editorial team researches and writes original content covering AI tools, model releases, open-source developments, and industry analysis. Our goal is to cut through the noise and give developers, product teams, and AI enthusiasts accurate, timely, and actionable information about the fast-moving AI ecosystem.
About Skila AI →Related Resources
Weekly AI Digest
Get the top AI news, tool reviews, and developer insights delivered every week. No spam, unsubscribe anytime.
Join 1,000+ AI enthusiasts. Free forever.