Anthropic A/B Tested Claude Code Plan Mode Without Telling Users

In March 2026, developer Mike Ramos published a post revealing that Claude Code had been quietly enrolling paying users in A/B tests that changed how its plan mode worked — without telling them. Ramos, who pays $200 per month for the tool, noticed his detailed, context-rich plans had been replaced with terse bullet lists. When he queried Claude directly, the model said it was operating under system instructions to cap plans at 40 lines, forbid context sections, and strip prose. The post hit the top of Hacker News, where developers argued about disclosure obligations AI vendors owe to professionals who depend on consistent behavior to do their jobs.

The engineer behind the experiment responded on Hacker News, confirming the test and explaining the reasoning. Claude Code's plan-mode prompt had been largely unchanged since the 3.x model series, and the hypothesis was that newer 4.x models could succeed with less explicit direction — meaning shorter plans might reduce rate-limit hits without degrading outcomes. Ramos and several thousand other users were assigned the most aggressive variant, the 40-line cap. Early data showed minimal impact on rate limits, and the engineer ended the experiment early once the backlash landed. The episode also surfaced the economics underneath: at $200 per month, per-user compute costs may exceed revenue, making resource-reduction experiments financially rational even when they break professional workflows.

Every major competitor reserves similar rights to experiment on users. Windsurf's terms of service explicitly allow the company to "modify or discontinue any feature at any time without notice or disclosure." GitHub Copilot's customer agreements list running experiments as a stated data-use purpose while offering no opt-out for behavioral tests. Cursor faced its own fallout in mid-2025 when it silently moved to compute-based pricing, resulting in unexpected charges and a public CEO apology. No paid AI coding platform currently publishes a written policy on A/B testing disclosure or offers paying users a documented opt-out from behavioral experiments.

Cognition AI comes closest to the standard Ramos's post implicitly demanded. Devin uses interactive planning that requires user approval before task execution, and when the company rolled out Devin 2.2, it offered existing users an opt-in toggle rather than a silent rollout. That's a product design choice, not a formal policy commitment. On Hacker News, several developers responded to the incident by calling for a basic norm: a changelog entry, or at minimum an email, before any paying user is enrolled in a test that changes tool behavior. "Treat professionals like beta testers and eventually they'll find something else," one commenter wrote. So far, no platform has made that commitment in writing.