Cooperative Vectors Kill the Bucketing Headache in Shader ML

Luca Quartesan from Evolve Solutions just published a detailed walkthrough of Cooperative Vectors, a GPU extension that fixes a real headache in neural rendering. The problem: when you run neural networks inside shaders, adjacent pixels often need different networks. Neural Materials might have each material compressed into its own network. Two pixels side by side could need entirely different weights.

The existing solution, Cooperative Matrix, wasn't built for this. It expects uniform work across all threads. Developers had to bucket pixels by material and run multiple dispatches. Functional, but clunky.

Cooperative Vectors (VK_NV_cooperative_vector) changes the interface from matrix-matrix operations to vector-matrix operations. Each thread holds its own "long vector" and can multiply it against whatever matrix it needs. No bucketing. No separate dispatches. Just a branch in your shader.

The hardware combines these individual vector-matrix multiplies across a wave into an efficient matrix-matrix operation behind the scenes.

That's the key trick. You write divergent code. The GPU makes it fast anyway.

NVIDIA's research on Neural Texture Compression prompted the extension, since compressing material textures into per-material networks created exactly this divergent evaluation problem.

The extension works across Vulkan and DirectX, where it's called "long vectors" in HLSL. Matrices can sit in plain row-major or column-major layouts, or in optimized layouts: MulOptimal for inference, OuterProductOptimal for training and accumulation.

That second layout matters because Neural Radiance Caching requires runtime training of small networks, not just inference.

The hardware acceleration taps into NVIDIA Tensor Cores, Intel XMX, and AMD's WMMA instructions depending on your GPU. All shader stages can access it, though Evolve uses it mostly in compute shaders.

Epic Games has already integrated neural network inference into Unreal Engine 5.3 and later, building the infrastructure for developers to use these extensions. Shipping titles explicitly using Cooperative Vectors remain rare given how new the technology is, but NVIDIA's SDK samples and tech demos are validating the approach on RTX hardware. Just as AI-first game studios rely on specialized workflows to maintain efficiency at scale, Cooperative Vectors offers a hardware-native solution for complex neural rendering.

The bucketing workarounds are dying. Native divergent shader execution is here.