An open-source coding agent called Dirac just topped TerminalBench 2.0, beating Google's own baseline and the leading closed-source competitor. Built by Max Trivedi at Dirac Delta Labs, it scored 65.2% using Google's gemini-3-flash-preview. Google's official baseline sat at 47.6%. Junie CLI, the top closed-source agent, managed 64.3%. Trivedi's core insight is simple: reasoning degrades as context length grows. Dirac aggressively curates what the model sees. It uses hash-anchored parallel edits with stable line hashes and the Myers diff algorithm instead of traditional search-and-replace. AST-native code manipulation handles structural changes. Multi-file batching lets it edit several files in one LLM roundtrip. The numbers speak for themselves. $0.18 average cost per task, compared to $0.49 for Cline and $0.60 for Roo. That's a 64.8% reduction. Dirac also went 8 for 8 on accuracy across tests against major repos including VSCode, Django, and Transformers. How you feed the model matters more than which one you pick. Dirac's code is open source on GitHub, much like Mario Zechner's pi.
Dirac OSS Agent Crushes Google's Baseline on TerminalBench
Dirac is an open-source AI coding agent that achieved a 65.2% score on the TerminalBench 2.0 leaderboard using gemini-3-flash-preview, outperforming Google's official baseline (47.6%) and the top closed-source agent Junie CLI (64.3%). It focuses on token efficiency and context curation, reducing API costs by 64.8% on average while producing better work through hash-anchored parallel edits, AST manipulation, and multi-file batching.