Copilot Swarm Orchestrator enforces proof-of-work and cost tracking across multi-agent runs

GitHub user moonrunnerkc has released Copilot Swarm Orchestrator, an open-source TypeScript tool that wraps GitHub Copilot CLI and requires every agent in a swarm to prove its work before code is allowed to merge. The project was originally submitted to the GitHub Copilot CLI Challenge in early 2026.

The verification model is blunt: agents don't get credit for claiming they completed a task. Each job runs on an isolated git branch, and the orchestrator parses transcript evidence — commit SHAs, test output, build markers, file changes — to confirm what actually happened. Steps that can't produce verifiable artifacts are blocked outright.

After agents finish, generated code moves through a six-gate quality pipeline. The gates check for scaffold leftovers, duplicate blocks, hardcoded configuration, README claim drift, test isolation violations, and runtime correctness. Failures are classified by type — build, test, missing-artifact, dependency, or timeout — and the orchestrator attempts targeted repairs over up to three retries. A Critic governance agent scores each step and can pause execution to request human review.

Cost tracking runs in two phases. Before execution starts, the tool estimates GitHub Copilot premium request consumption using model multipliers: claude-sonnet-4 and gpt-4o are rated 1x, o4-mini 5x, and o3 20x. Historical failure rates from an internal knowledge base are folded into the retry probability estimate. Overage is billed at $0.04 per request. After each run, per-step attribution data — estimated versus actual requests, retry counts, prompt tokens — is written to a cost-attribution.json file for auditing.

The scheduler is greedy: steps launch as soon as their dependencies resolve rather than waiting for a full wave to clear. Parallel execution happens either through direct Copilot CLI subprocess dispatch or the native /fleet command. The project ships with 649 passing tests, a TUI for live swarm monitoring, and a web dashboard for reviewing completed runs.