Reverse-Engineering Claude: Opus 4.6 Likely Has Around 100B Active Parameters

The researcher behind the Substack handle 'unexcitedneurons' has spent time doing something Anthropic won't: counting. Using token throughput data pulled from OpenRouter, they've published a reverse-engineering estimate of how many parameters actually activate when Claude Opus 4.5 and Opus 4.6 generate text.

The method exploits a basic property of Mixture-of-Experts inference. During decoding, generation speed on fixed hardware is bottlenecked by active parameter count, not total model size. Run a few open-weight Chinese MoE models with known architectures on the same infrastructure, measure tokens per second, and you can back out the effective memory bandwidth of the serving cluster. The author does this for three models on Google Vertex AI — DeepSeek V3.1 at 37.5B active parameters, GLM-4.7 at 33.6B, and Kimi K2 Thinking at 32.9B — and lands on a consistent bandwidth window of roughly 4.0–4.5 TB/s.

Apply that window to Opus 4.6's observed 43 tokens per second on Vertex, and the active parameter count comes out at 93–105B under FP8 quantization. Under a mixed FP8/FP4 setup similar to Kimi K2's quantization-aware training, the range climbs to 127–154B. For total parameters, the most defensible assumption is DeepSeek-style 8-of-256 expert routing, which puts both Opus 4.5 and 4.6 somewhere between 1.5T and 2T.

That's a large model — roughly three times the size of the leading Chinese MoE systems. But it's nowhere near the 10-trillion-plus figures that circulate on AI forums. The physics of how fast the model responds places hard constraints on how large the active path can be, and this analysis doesn't leave much room for the more dramatic speculation.

The sharpest claim concerns Opus 4.5. At around 40 TPS on Vertex, it sits far closer to Sonnet 4.5 at ~41 TPS than to Opus 4.1 at ~24 TPS. Combined with the API pricing ratios, the researcher argues Opus 4.5 is a distilled Sonnet-class model repositioned under the Opus brand — not a meaningful architectural upgrade. Whether that reads as a pointed critique of Anthropic's product strategy or simply an observation about naming conventions probably depends on what you were paying for it.

For teams running agent pipelines at scale, the practical implication is narrower. If these estimates are in the right ballpark, Opus 4.6's advantage over Chinese MoE models comes from training quality and architectural decisions, not from being dramatically larger. The raw parameter gap is real, but it doesn't explain the capability gap on its own.