CLI Resume Mode Lets Agents Talk Without API Costs

Juan Pablo AJ found a way to make Claude, Codex, and Gemini work together without racking up API charges. His approach is almost disappointingly simple. Instead of wiring models together through APIs, you let one agent invoke another through the CLI using resume mode. Commands like 'codex exec resume --last' and 'gemini -r latest' keep the conversation going instead of starting fresh each time. The result is a lightweight loop where one agent produces work, another critiques it, and they iterate until the output is good enough.

If you want more visibility, there's a tmux-based pattern that lets you watch agents work in separate panes. You see what each one is doing, capture output, and debug when things get messy. The tradeoff is setup complexity, though it's still just tmux, not some new framework to learn.

This idea is catching on. Claude Code already has an experimental 'agent teams' feature that uses tmux integration and local file-based inboxes for inter-agent messaging. Developer cs50victor reverse-engineered that protocol and released claude-code-teams-mcp, making those capabilities available to any MCP client. Wes McKinney, who created Pandas, built roborev, a continuous code review tool that orchestrates multiple coding agents through a local terminal UI. Multiple projects are converging on tmux as a practical orchestration layer.

But Juan Pablo AJ raises a point worth sitting with. LLMs are good at producing plausible text, and when they talk to each other, they produce a lot of it. In early tests, multi-agent loops did catch bugs that a single pass missed. But they also tended toward over-engineering. The longer the chain of interactions, the more the output drifted toward verbose, cautious solutions rather than simple ones. The real question is whether agents can reach consensus (they can) and whether the final result is actually better, or just a more polished version of the same output. That skepticism is healthy. These multi-agent loops are worth testing, but they're not automatically better than a single model doing the work.