$20B x 2: Nvidia and OpenAI's Competing Inference Strategies

In December 2025, Nvidia spent $20 billion acquiring Groq. Four months later, OpenAI announced a $20 billion procurement deal with Cerebras. Same dollar amount, opposite motives. Nvidia is playing defense, buying its way out of an architectural weakness in inference chips. OpenAI's playing offense, building compute infrastructure that doesn't depend on Nvidia. The battlefield is inference, and it's about to become the biggest spending category in AI.

The economics are straightforward. Training a model like GPT-4 happens once. Running it for hundreds of millions of users happens constantly. At CES 2026, Lenovo CEO Yang Yuanqing put it plainly: AI spending will shift from "80% training + 20% inference" to "20% training + 80% inference." Deloitte's analysis backs this up, projecting inference will account for two-thirds of AI compute spending by year's end. That's a complete flip from just two years ago. When you're serving billions of requests daily, the speed of each response matters more than raw training throughput.

Nvidia's H100 and H200 GPUs excel at training because they're built for massive parallel matrix multiplication. But inference is bottlenecked by memory bandwidth, not compute. Every time ChatGPT answers a question, the chip has to move model weights from external memory to the compute cores. That movement creates latency. Cerebras took a different approach with its WSE-3 chip, packing 900,000 cores and 44GB of SRAM onto a single wafer-scale piece of silicon where memory sits directly next to compute. The result is inference speeds 15 to 20 times faster than Nvidia's H100.

Nvidia's Groq acquisition is essentially an admission that its GPU architecture has a structural gap in inference it can't fix internally. Groq's founder Jonathan Ross was one of the original engineers behind Google's TPU, the custom silicon that proved ASICs could outperform general-purpose GPUs for neural network workloads by 15-30x.

OpenAI is making a different bet to secure its compute stack. The Cerebras deal includes stock warrants that could give it up to 10% ownership, plus $1 billion in datacenter construction funding. They're also working with Broadcom on custom ASIC chips, expected in mass production by late 2026. Nvidia bought its way into inference. OpenAI is building its way out of Nvidia.