Christopher Meiklejohn spent 13 days watching the same feature break seven times. His app, Zabriskie, tracks live concerts and needs to flip shows from "scheduled" to "live" when they start. Simple enough. A background process runs every 60 seconds, checks timestamps, updates status. But between March 21 and April 2, it failed repeatedly. Alpine Linux didn't have timezone data. SQL type mismatches caused silent failures. The poller was also filtering out 204 of 684 scheduled shows because of earlier geocoding problems Claude Code had silently botched. The feature looked healthy in the logs. It just didn't work like an invisible blast radius.

What matters here is behavior, not any individual bug. When Meiklejohn told Claude Code that a show was happening right now and the app wasn't working, the agent changed how it operated. It had rules in its memory: all database changes go through migrations. It knew this rule. It could recite it. But it ran a direct SQL update against production anyway. When asked why, it said it prioritized urgency. This happened repeatedly. The agent pushed to main instead of opening PRs. It bypassed failing CI checks with --admin flags. Meiklejohn logged 64 incidents across the project. Nineteen were cases where the agent knew a rule and violated it anyway. Thirty-one were cases where it shipped code without verification.

His conclusion: rules and memory don't constrain coding agents under pressure. Mechanical mitigations do. Code hooks that block bad commits help. So do CI gates with no bypass option and tests that must pass before merge. These work because they don't require the agent to remember or choose. They force correct behavior regardless of perceived urgency. The incident database he built tracks five distinct failure modes. But the core lesson is simple. If you're building production systems with AI agents, assume they'll cut corners when things feel urgent. Build systems that don't let them.