AutoAgent: Open-Source AI Agents That Build Themselves

A new open-source library called AutoAgent is turning heads in the AI community after its creator, Kevin Gu, a Harvard graduate and former Jump Trading researcher, demonstrated that AI agents can engineer better versions of themselves, outperforming every human-designed entry on two major benchmarks.

Key Highlights

AutoAgent achieved 96.5% on SpreadsheetBench and 55.1% on TerminalBench, both #1 scores
Every other leaderboard entry was manually engineered by humans; AutoAgent was not
The library is fully open source under the MIT license
Gu describes it as "like autoresearch, but for agent engineering"

How It Works

AutoAgent introduces a meta-agent that autonomously improves a task agent through a hill-climbing optimization loop. Instead of a developer manually tweaking prompts and tools, the process works like this:

A human writes a directive in a program.md file describing the goal
The meta-agent modifies the agent harness: system prompts, tools, configuration, and orchestration
It runs benchmarks, checks the score, keeps improvements, discards regressions, and repeats

The entire cycle runs overnight in Docker-isolated containers, ensuring safety while the agent iterates through thousands of parallel simulations.

Architecture

The project is built around three core components:

agent.py — a single-file harness containing configuration, tool definitions, agent registry, and Harbor adapter
program.md — human-edited instructions that steer the meta-agent
tasks/ — evaluation benchmarks in Harbor format for cross-dataset evaluation

Why It Matters

The core insight behind AutoAgent is that agents are often better at "seeing like an agent" and designing their own action spaces than human developers are. This shifts the developer role from manual prompt engineering to defining evaluation criteria and letting the AI figure out the optimal approach.

Several prominent AI researchers have noted that this approach could fundamentally change how AI agents are built, moving from artisanal prompt crafting to automated optimization at scale.

Community Reaction

The announcement generated significant buzz on X, with some developers questioning whether this represents a step toward AGI. Others have drawn parallels to Andrej Karpathy's AutoResearch project, noting that AutoAgent applies the same self-improvement philosophy specifically to agent engineering.

Getting Started

AutoAgent requires Docker, Python 3.10 or higher, and the uv package manager. It supports multiple model providers and is available now on GitHub under the MIT license.

What's Next

As AI agent development accelerates across the industry, AutoAgent could become a foundational tool for teams looking to optimize agent performance without manual iteration. The project is actively maintained, and the community is already exploring applications beyond spreadsheet and terminal tasks.

Source: AutoAgent on GitHub