GitHub Copilot switches to usage-based pricing on June 1, 2026. Microsoft frames this as an "evolution" into an agentic platform. They've been bleeding money for years.

According to the Wall Street Journal, Microsoft lost an average of over $20 per user per month in early 2023. Some power users cost upwards of $80 monthly. The charge to users: $10 to $19. A giveaway with a login screen.

As Ed Zitron reported in his Where's Your Ed At newsletter, the corporate spin masks a simple truth. Flat-rate AI subscriptions cannot survive contact with actual usage patterns.

The problem runs deeper than one product. Every major AI service has operated on subsidized compute. ChatGPT. Claude. Perplexity. Users pay a fixed monthly fee while burning through tokens at wildly different rates. One person asks simple questions. Another runs multi-hour autonomous coding sessions. Both pay the same.

Zitron calls this the "subprime AI crisis." The analogy fits. The industry assumed they could hook users on cheap access and hike prices later. They also assumed token costs would fall. Neither happened. New reasoning models burn more tokens than their predecessors. Inference gets more expensive over time. Not less.

Margins tell a different story depending on where you sit. Hacker News commenters point out that frontier labs like OpenAI and Anthropic likely keep margins above 80% on raw token services. The real pain hits resellers like Cursor and Perplexity that rely on third-party API access.

Small Language Models offer an escape hatch. Microsoft's Phi-3. Meta's Llama 3. These run on edge devices or cheap cloud instances, handling straightforward queries for pennies while routing only complex tasks to expensive frontier models. This "model cascading" approach could stabilize per-user costs without gutting the experience.

Whether the industry adopts it fast enough is an open question. The Copilot backlash suggests users won't quietly accept the jump from all-you-can-eat to pay-per-token.