Claude/Codex Agents Get Evolutionary Database in Autoresearch Fork

A developer known as hgarud has forked Andrej Karpathy's autoresearch project, replacing its TSV-based experiment logging with an evolutionary database to make the autonomous ML research loop more efficient. The original autoresearch framework gives an LLM agent — Claude, Codex, or similar — a single Python training file (train.py) and a fixed 5-minute GPU time budget per experiment, yielding roughly 100 overnight runs without human involvement. The fork, available at github.com/hgarud/autoresearch, adds evo_db.py, a MAP-Elites quality-diversity database manager modeled directly on OpenEvolve, itself an open-source implementation of Google DeepMind's AlphaEvolve published in June 2025.

Rather than keeping a flat log of past experiments, the evolutionary database organizes discovered solutions — neural network architectures and hyperparameter configurations — across an N-dimensional feature grid of "islands," each maintaining a diverse population scored by the val_bpb (validation bits per byte) fitness metric. When the agent needs a starting point, the database samples from a random island and returns candidate solutions paired with a strategy hint — exploit, explore, or random — drawn from a tunable probability distribution. This diversity-preserving memory steers the search toward both high-fitness and novel regions of the solution space, replacing the passive linear history of the original design. A lineage visualization tool (visualize_lineage_tree.py) lets researchers inspect how solutions evolved across generations.

AlphaEvolve showed that evolutionary LLM coding agents could beat 56-year-old results in matrix multiplication and optimize Google's own infrastructure. OpenEvolve brought that approach into the open. The autoresearch fork applies the same evolutionary logic specifically to neural network research, where each fitness evaluation costs real GPU-hours — making the database hyperparameters themselves (number of islands, population size, exploit-versus-explore balance) an open research question the author explicitly flags. The system requires a single NVIDIA GPU, tested on an H100, and a Claude or Codex API key.

The fork adds three core files beyond the original and inherits Karpathy's framing of program.md — the human-authored instruction document governing agent behavior — as the locus of scientific intent. By layering an evolutionary optimizer on top of an already-autonomous agent loop, the human researcher is one step removed from individual experiments. The author has posted no benchmark results yet comparing the evolutionary approach against the original flat-log baseline, which is the obvious next test.