Cache Bug Devours Pro Max 5x Quota in Just 90 Minutes

Anthropic has a billing bug in Claude Code that's burning through Pro Max 5x quotas in under two hours. One hundred million tokens. Gone. The issue, documented in GitHub issue #45756 by user molu0219, shows that cache_read tokens are counting at full rate against rate limits instead of the expected 1/10 discounted rate.

That means prompt caching, which is supposed to save users money and extend quota, provides basically zero benefit for rate limiting purposes. The user's session data tells the story: 103.9M cache_read tokens across just 691 API calls in 1.5 hours of moderate usage. If the discount were applied correctly, that would've been 13.1M effective tokens. Roughly 8.7M per hour. Well within acceptable limits. Instead, the full 103.9M got counted against their quota.

Then there are the compounding factors. Claude Code's 1M context window means each API call near the compaction threshold sends close to a million tokens. Background sessions you aren't even actively using still eat into your shared quota pool. And auto-compact events trigger a single massive API call with the full pre-compact context, the most expensive request happening automatically without any user action.

In this case, two background sessions (token-analysis with 296 calls and career-ops with 173 calls) were silently consuming quota while the user worked in their main session.

One GitHub commenter called the last two years "a golden era of subsidized GenAI compute" that's now ending as providers tighten usage restrictions. Nobody's happy. Users in the thread report switching to OpenAI's Codex and Amazon's Kiro, citing more generous limits and better quota transparency. Google's Gemini CLI is apparently dealing with similar capacity complaints. Anthropic hasn't yet responded to the bug report, which carries both 'bug' and 'area:cost' labels.