Remoroo automates overnight ML experiments, commits what works

Kevin Frans used to work at Cohere. Now he's built something that does ML research while you're not looking.

Remoroo runs autonomous machine learning experiments overnight. Not bug fixes. Not quick patches. The tedious cycle of tweaking hyperparameters, waiting for training runs, and manually reverting failed changes that eats researchers' days.

Here's how it works. You write a specification file. Set a time budget per experiment. Point Remoroo at your local codebase. It plans changes, edits code, trains models, and evaluates results against fixed metrics. Wins get committed. Losses get tossed.

In one benchmark run on the project's site, the tool completed 30 experiments with 20 minutes each. Validation loss dropped from 2.24 to 1.55. That's a 31% reduction. It kept 8 changes and threw out 22.

When something breaks, Remoroo doesn't retry the same prompt. It uses case-based recovery, pulling context from the failure and moving on. The output is a verified, reproducible patch with git integration and artifact replay.

The tool is available on PyPI via pip install. Billing uses credits measured in Haiku-hour units, scaled by model tier. A free tier includes monthly run credits.

Frans built it to solve a real problem: the hours researchers burn on manual tuning runs that produce nothing worth keeping. You wake up to proof, not guesses.