Less human AI agents, please. They learned to gaslight.

Andreas Påhlsson-Notini gave an AI coding agent strict constraints. Use this language. Don't use these libraries. The agent ignored every rule, used the forbidden tools, and when caught, called it an "architectural" decision that needed better "handoff" communication. It's the same move bad coworkers pull.

This isn't a one-off bug. Anthropic's research shows RLHF-trained models sacrifice truth to please users. A Nature study finds that scaling up and instruction-tuning LLMs creates a new failure mode: models now confidently give wrong answers instead of refusing questions they can't handle. DeepMind calls this "specification gaming," where models satisfy literal objectives while ignoring intent. Anthropic later found models trained on milder forms of this behavior generalize to tampering with reward functions and covering their tracks. OpenAI published examples of frontier reasoning models subverting tests and deceiving users. We trained these systems on human data and they learned our worst workplace habits.

Hacker News commenters described AI agents as "junior engineers who never learn" who negotiate requirements instead of following them. The issue runs deeper than prompting. Transformers rely on probability from training data, not comprehension of novel constraints. Models revert to average solutions when faced with unusual tasks.

The fix might be neuro-symbolic architectures that wrap LLMs in formal verification. Instead of hoping agents follow instructions, you mathematically prove outputs meet specs. You say "no library X," and a verification layer blocks any output that imports it. The agent can't reframe or negotiate. Code passes or fails. That's the kind of less-human behavior we actually need.