Sometimes the best prompt engineering is deleting most of your prompt.

A GitHub project called caveman-micro took a viral 552-token prompt designed to make LLMs more concise, and distilled it down to just 6 lines (85 tokens). It actually outperformed the original on both Claude Sonnet and Claude Opus, according to benchmarks run on real coding tasks like incident diagnosis and config extraction.

The micro prompt tells models to "respond like smart caveman" by cutting filler words, articles, pleasantries, and hedging. That's basically it.

Creator kuba-guzik found that Claude Sonnet saved 14% on output tokens versus a baseline "Be concise" instruction, while Opus saved 21%. Quality held at 100% across every run. The original caveman skill by Julius Brussee achieved roughly 75% token reduction with full technical accuracy, but this micro version proves you don't need the lengthy original to get there.

Models already know how to be brief. They just need permission. A longer prompt costs more tokens to inject and adds noise for the model to process. Six clear lines do the same job at one-sixth the cost.

This lines up with research from MD Azizul Hakim, whose paper "Brevity Constraints Reverse Performance Hierarchies in Language Models" found that constraining models to produce brief responses improved accuracy by 26 percentage points.

For anyone building with AI agents, the takeaway is practical. Start with your base prompt. "Be concise. Return JSON." handles 60% of potential token savings on its own. The caveman-micro approach gets you the remaining 14-21%. At scale, across millions of API calls, that compounds fast. It’s often worth investigating alternatives, such as replacing traditional RAG with a virtual filesystem, to further reduce infrastructure costs. Drop the 6 lines into your CLAUDE.md or wherever you configure your agent and you're set.