Git-pinned cache cuts AI agent costs in half

Someone on Reddit figured out how to cut AI agent costs in half, and the trick is straightforward: pre-load codebase facts into a cache pinned to Git blob OIDs. Reddit user OkBreath9382 ran controlled tests (N=5) on a Python repository where an agent had to add structured logging. The treatment group cost $2.13 per session versus $4.35 for the control, a 51% reduction using Claude Opus 4.7 pricing.

The approach works by spending roughly $0.01 on a Claude Haiku call to scan the repo and generate short claims like "HTTP calls are only in src/api/." Each claim gets pinned to specific Git blob object IDs, and a Merkle root is computed over path and OID pairs. When a file changes, its blob OID changes, the Merkle root breaks, and the claim is marked stale. The agent never receives outdated information.

The real insight here is that exploration is the most expensive part of running a capable model like Opus. The model burns tokens using Grep and Read tools just to figure out where things are. By loading verified facts into the prompt's cached prefix, the agent skips the wandering and goes straight to work. The experiment saw 61% fewer cache-write tokens, 52% fewer output tokens, and a 16% speed improvement. Results were consistent across all five runs. A commenter named aleksiy described a related approach called "recursive summarization" that builds a tree of summaries at each folder level. They were solving a different problem, agents being too hyper-focused rather than token efficiency, but the principle rhymes: give the agent a map so it stops drawing one from scratch every session. Sumato's semantic layer provides a higher-level map to navigate code.

Full experiment breakdown is on GitHub.