Pi-Autoresearch: Open-Source Autonomous Experiment Loop for LLM Training, Test Speed, and Lighthouse Scores

Pi-Autoresearch, an open-source project published in March 2026 by developer davebcn87, brings Andrej Karpathy's autoresearch concept to the "pi" agent platform as a first-class extension and skill. The core idea, borrowed directly from Karpathy's original karpathy/autoresearch repository, is deceptively simple: give an AI agent a measurable target, let it propose and test changes autonomously, keep what improves the metric, revert what doesn't, and repeat indefinitely. Where Karpathy's original was scoped specifically to LLM training on a single-GPU setup with 5-minute experiment budgets, pi-autoresearch generalizes the pattern to any quantifiable optimization target — test suite speed, JavaScript bundle size, build times, and Lighthouse performance scores are all cited as supported domains.

The project is architecturally split into two layers: a domain-agnostic extension that contributes tools (init_experiment, run_experiment, log_experiment), a live status widget, and a full dashboard accessible via slash command; and a domain-specific skill (autoresearch-create) that gathers the user's optimization goal and generates two session files. The autoresearch.jsonl file serves as an append-only log of every experiment result, while autoresearch.md captures the session objective and history in human-readable form. Together, these two files allow a fresh agent instance with no memory to resume an interrupted session exactly where it left off — a practical solution to the context window limitations that would otherwise make indefinite autonomous loops fragile. An optional autoresearch.checks.sh script adds correctness backpressure, blocking a "keep" commit if tests, types, or lint fail after a passing benchmark.

The project has already attracted cross-platform porting activity. A developer identified as ozeron announced on the Hacker News thread a Claude Code plugin variant — github.com/ozeron/autoresearch — that translates pi-autoresearch's architecture into a Model Context Protocol server, exposing the same core tools to Claude Code users. The port adds Claude Code-specific lifecycle hooks: a stop-guard that prevents the agent from terminating while experiments are active, and a session-start hook that injects a session summary on resume. Notably, the JSONL state format is kept compatible with the original pi implementation, suggesting potential for shared tooling across platforms.

Pi itself uses a GitHub-based package manager model and separates platform-level extensions from domain-specific skills. Its underlying model backend is unspecified in the repository, though it competes in the same tier as Claude Code, Cursor, and Windsurf. The fact that ozeron preserved JSONL compatibility with the original means experiment logs from pi sessions can, in principle, be read by the Claude Code port — and vice versa. Whether that cross-platform interoperability gets built out is the more interesting question going forward.