Chester Lam at hardware analysis publication Chips and Cheese has published a detailed architectural breakdown of Nvidia's GB10 integrated GPU, the centerpiece of Nvidia's GB10 system-on-chip. The iGPU features 48 Streaming Multiprocessors clocked at up to 2.55 GHz, putting its raw SM count on par with a power-constrained RTX 5070. Nvidia's pitch for the chip is explicitly AI-first: by combining significant GPU compute with the CUDA ecosystem, GB10 aims to give developers a familiar, well-optimized target for <a href="/news/2026-03-14-run-openclaw-ai-agent-locally-on-amd-ryzen-ai-max-and-radeon-gpus">local inference workloads</a> — a direct challenge to AMD's Strix Halo, which takes a more general-purpose approach to the integrated high-performance GPU market.

The analysis offers a granular look at GB10's two-level GPU cache hierarchy, which centers on a high-capacity 24 MB L2 that acts as both a last-level cache and the first destination for L1 misses. Benchmarks show the two architectures trade wins depending on working set size: AMD's smaller, faster caches — a 256 KB L1 and 2 MB L2 backed by a 32 MB memory-side Infinity Cache — outperform GB10 when data fits within those tighter structures, while Nvidia's larger L2 wins when AMD is forced to spill to its Infinity Cache. Both platforms rely on LPDDR5X memory, but Strix Halo achieves lower DRAM latency on the GPU side, reversing the CPU-side advantage seen on GB10. On write bandwidth, GB10's L2 delivers just under 1 TB/s compared to approximately 700 GB/s on Strix Halo.

One architectural clarification in the piece is likely to be significant for developers: GB10 uses a consumer-variant Blackwell architecture, not datacenter Blackwell. This means it ships with 128 KB of L1 and Shared Memory per SM rather than the 256 KB found in datacenter parts like the B200. The system-level cache on GB10 also does not function as additive last-level cache capacity on top of the graphics L2 — Nvidia's own documentation describes it as enabling power-efficient data sharing between compute engines rather than boosting raw throughput. On the software side, GB10 does support coarse-grained OpenCL Shared Virtual Memory without requiring full buffer copies, a practical usability improvement over some competing Qualcomm and Arm iGPU implementations.

For developers targeting GB10 for inference workloads, the consumer-Blackwell distinction is the sharpest practical constraint Lam's analysis surfaces. GB10's L1 cache combines low latency with higher capacity than AMD's equivalent first-level caches — a genuine advantage for certain inference access patterns. But code written to exploit datacenter-specific features, including the doubled L1 and Shared Memory per SM found on the B200, will not transfer directly. That gap is the source of developer confusion the Chips and Cheese piece explicitly sets out to address, and it's the detail most worth internalizing before committing an inference pipeline to the platform.