ATLAS: Self-improving trading agents via Karpathy-style autoresearch

General Intelligence Capital, an AI-native investment firm founded by Chris Worsey, has open-sourced ATLAS — an autonomous multi-agent trading framework that takes Andrej Karpathy's autoresearch concept and applies it directly to financial markets. The mechanism is straightforward: instead of neural network weights, ATLAS optimizes agent prompts. Rolling Sharpe ratio serves as the loss function, each five-day trading window doubles as a training run, and the worst-performing agent gets its instructions rewritten at every cycle. Changes that improve performance get committed; failures get reverted via git.

The architecture runs 25 specialized agents across four hierarchical layers. Ten macro agents set the market regime — risk-on or risk-off — tracking central banks, geopolitics, dollar dynamics, and volatility. Seven sector desk agents identify specific opportunities within that regime, passing their picks upward to a third layer of four agents modeled on the investment philosophies of Druckenmiller, Aschenbrenner, Baker, and Ackman. At the top sits a decision layer: a Chief Risk Officer agent stress-tests each idea before a CIO agent makes the final call, with a Darwinian scoring mechanism that amplifies reliable agents and progressively muffles chronic underperformers.

Across 378 total trading days spanning backtest and live deployment, the system ran 53 prompt modification attempts — 16 survived, 37 were reverted, a 30% acceptance rate. Over the 173-day live deployment window (the remaining 205 days constitute the backtest period), the firm reports a +22% return, including a call on Broadcom (AVGO) at $152 that returned +128%. Those figures are self-reported and unaudited. At the individual agent level, the financials desk's Sharpe ratio moved from -4.14 to 0.45 after autoresearch ran its course; emerging markets and semiconductor desks also improved. The standout result came from the top of the stack: the CIO orchestration agent — the system's own portfolio manager — was pushed to minimum weight by the Darwinian scoring process. The team interprets this as the system identifying that synthesis quality across agents matters more than any individual agent's intelligence.

The open-source release covers the framework architecture, autoresearch loop design, and backtest results. The trained prompts — 378 days of market-driven evolutionary selection — stay proprietary. The system runs on Claude Sonnet via the Anthropic API and a $20-per-month Azure VM, no GPU infrastructure required. The firm's implicit argument is that a well-chosen loss function and the discipline to revert when the market disagrees can drive meaningful agentic self-improvement without the compute bill to match.