Mendral halved LLM costs by making Opus barely run

Mendral upgraded from Claude Sonnet 4.0 to Claude Opus 4.6 and their daily LLM bill dropped by more than half. The expensive model barely runs. That's the whole trick.

The company analyzes around 4,000 CI failures every week. About 80% are duplicates of known issues. Andrea Luzzardi, who wrote the case study, explained that a Haiku 4.5 agent acts as a "triager," checking each failure against known problems using exact matching and semantic search via pgvector. Only novel issues escalate to Opus. A triager match costs roughly 25x less than a full investigation. When Opus does engage, it plans the investigation and spawns Haiku sub-agents to retrieve data via SQL queries to ClickHouse or the GitHub CLI. The orchestrator never reads logs directly.

The design avoids common agent traps. AI agent reliability is often achieved by mechanical mitigations, like not dumping logs into prompts because, as Luzzardi put it, "you've already made a judgment about what's relevant before you know what the problem actually is." Sub-agents are capped at one nesting level to prevent cost blowups. Haiku absorbs the raw data so Opus doesn't have to. Haiku's input/output ratio is 86:1, reading heavily and returning focused extracts. Opus sits around 50:1.

A Hacker News commenter pointed out the framing is slightly misleading. The savings come from the hierarchical architecture, not Opus specifically. Fair point. The pattern works for any high-volume event stream. Agentic orchestration patterns let the same tool work at small local scales and large remote scales.