Julius Brussee built a Claude Code skill called Caveman that strips LLM output down to its bare essentials. The tool removes articles, pleasantries, and hedging phrases, turning verbose explanations into punchy, direct responses. A 69-token React debugging explanation becomes 19 tokens. Same technical content, just without the "I'd be happy to help you with that" filler. The skill claims 75% cost savings on output tokens and roughly 3x faster responses. You trigger it with "/caveman" or "talk like caveman."

The Hacker News thread raised a real concern though. For LLMs, tokens aren't just output. They're thinking space.

User TeMPOraL pointed out that "tokens are units of thinking," meaning forcing brevity might literally make the model worse by constraining its computation. Another user, teekert, reported more misunderstandings when trying caveman-style prompts, needing extra turns to clarify what went wrong.

Caveman works as a post-processing layer, not a prompt constraint. It preserves code blocks, technical terms, and error messages while filtering everything else through rule-based text stripping. The model still thinks in full sentences, but you only see the compressed version. For debugging sessions where the answer is usually a single code fix, the trade-off seems worth it. For complex architectural discussions, maybe not.