2 posts tagged with "GTC 2026"

NVIDIA Spent $20 Billion Because GPUs Alone Can't Win the Inference Era

March 17, 2026 · 19 min read

Barrack AI

On March 16, 2026, Jensen Huang took the stage at GTC in San Jose and unveiled the NVIDIA Groq 3 LPU: a chip that is not a GPU, does not run CUDA natively, and exists for one reason only. Inference.

Three months earlier, on Christmas Eve 2025, NVIDIA paid $20 billion in cash to license Groq's entire patent portfolio, hire roughly 90% of its employees, and acquire all of its assets. It was the largest deal in NVIDIA's history. The company that built the GPU monopoly spent $20 billion on a chip that replaces GPUs for the most latency-sensitive phase of AI inference.

This is not a product announcement recap. Every major outlet has covered the Groq 3 specs. What nobody has published is the synthesis: why the GPU company needed a non-GPU chip, what the data says about GPU architectural limitations during inference decode, and what this means for the thousands of ML teams currently renting GPUs for inference workloads.

Every claim in this post is sourced. NVIDIA's own projections are labeled as such. Independent benchmarks are cited separately.

NVIDIA Rubin at GTC 2026: Full Technical Breakdown for ML Engineers

March 14, 2026 · 18 min read

Dhayabaran V

Barrack AI

336 billion transistors. 288 GB of HBM4 per GPU. 22 TB/s memory bandwidth. 50 petaFLOPS of FP4 inference per chip.

Those are the numbers NVIDIA is putting behind Rubin, the successor to Blackwell, announced at CES 2026 and entering production for H2 2026 deployment. GTC 2026 kicks off March 16 in San Jose, where Jensen Huang is expected to go deep on Rubin's architecture, pricing signals, and the software stack updates that make these numbers real.