Twill.ai races coding agents on your task, delivers the best PR

Twill.ai, part of YC's Summer 2025 batch, lets you delegate coding tasks to cloud agents and get back finished pull requests. The workflow goes: research the codebase, plan the implementation, write code in a sandboxed environment, run an AI code review, then hand you a PR. You can assign the same task to agents often used in ctx, which unifies Claude Code and Cursor in a containerized workspace. Or run the same agent multiple times and pick the best result.

Picture a tricky pagination bug. Assign it to all three agents. Each spins up in its own sandbox and works independently, a model Druids coordinates. Maybe Claude handles the edge cases but Codex writes cleaner tests. You pick what works. That parallel comparison is Twill's real selling point.

Underneath, Twill built AgentBox, an open-source SDK that runs coding agents as server processes rather than simple CLI wrappers. It communicates via WebSocket and HTTP, preserving the agents' interactive capabilities. The SDK supports multiple sandbox providers: E2B, Modal, Daytona, and local Docker. You can SSH into any sandbox to debug with your own IDE.

The "prompt to PR" pitch sounds attractive, but the Hacker News crowd raised valid questions. The AgentBox SDK's first commit landed just days before launch. Not exactly reassuring for something you'd trust with your production codebase. Pricing is another concern. Why pay Twill when you could subscribe to Claude Code directly? The parallel-agent workflow needs to justify the added cost over individual subscriptions. For enterprise use, the sandboxing needs serious network egress controls to prevent data leaks. The smarter path might be starting with narrower tasks like codebase Q&A or CI debugging before promising full feature development.

Still, the core idea is sound. Running multiple agents against the same task and comparing outputs is something teams will want, especially as coding agents get better and cheaper. The question is whether Twill can build enough value on top of raw agent access to justify its existence as a middle layer.