Andrej Karpathy Open-Sources Autoresearch: AI Agents That Run 100 Experiments While You Sleep

Andrej Karpathy, the former head of AI at Tesla and a founding member of OpenAI, has open-sourced Autoresearch — a minimalist tool that lets AI agents autonomously conduct LLM training experiments on a single GPU. The project rapidly gained traction on GitHub following its release on March 7, 2026.

How It Works

The concept is deceptively simple. Autoresearch pairs a human-written instruction file (program.md) with a single training script (train.py) of roughly 630 lines of code. An AI agent — such as Claude or GPT — reads the instructions, modifies the training code, runs a fixed 5-minute experiment, evaluates the result using validation bits per byte (val_bpb), and decides whether to keep or discard the change. Then it loops again.

Each experiment takes exactly 5 minutes regardless of hardware, yielding approximately 12 experiments per hour and around 100 overnight. The agent works on a git feature branch, accumulating commits as it discovers better configurations for the neural network architecture, optimizer settings, and hyperparameters.

Key Highlights

Single GPU only: No distributed training or complex infrastructure needed — tested on a single NVIDIA H100
Self-contained: The entire project has minimal dependencies beyond PyTorch, with no external configs
Human-in-the-loop design: The researcher writes high-level instructions in Markdown; the agent handles the implementation details
MIT licensed: Fully open for commercial and research use

Why It Matters

Machine learning research has long been bottlenecked by the tedious cycle of hypothesis, implementation, training, and evaluation. Autoresearch automates the implementation and evaluation loop, letting researchers focus on the creative work of formulating hypotheses and interpreting results.

The tool represents a growing trend toward agentic AI research — using AI agents not just as coding assistants, but as autonomous experimenters that can explore vast parameter spaces faster than any human could manually.

Community Reaction

The project has generated significant excitement in the AI community. Shopify CEO Tobi Lutke publicly shared his results after running autoresearch overnight, reporting a 19% improvement in validation scores on a custom model. Developers have praised its minimalist design and the accessibility it brings to autonomous ML experimentation.

Autoresearch currently focuses on small-scale LLM training experiments using the nanochat architecture. The open-source community is already exploring extensions for multi-GPU setups and broader model architectures. The project is available on GitHub under the MIT license.

Source: karpathy/autoresearch on GitHub