Loop Engineering: Build AI Agents That Run Themselves

The End of Prompt Engineering

In 2023, GitHub counted roughly 300 million AI-assisted code commits. By early 2026, that number had climbed to 1.4 billion — nearly five times higher in three years. At NVIDIA's GTC Taipei keynote, Jensen Huang singled out that statistic as evidence of a deeper shift: developers are no longer writing prompts. They are writing loops.

This is loop engineering — the practice of designing systems that prompt AI agents autonomously, rather than typing instructions yourself. Where prompt engineering optimized individual interactions, loop engineering designs the scaffolding around the agent so it can reason, act, self-correct, and terminate without a human in the middle.

For most of 2024 and 2025, getting value from a coding agent meant writing a prompt, reading the response, writing the next prompt. Loop engineering flips that dynamic: you stop being the person who prompts the agent and start being the person who designs the system that prompts it.

The Four Loop Types

Not every autonomous agent needs the same loop shape. There are four primary patterns in production today:

Heartbeat loops run on a continuous short interval — seconds to minutes — for always-on monitoring tasks: watching logs, alerting on anomalies, keeping a connection alive. They are cheap, stateless, and never sleep.

Cron loops fire on a schedule. Daily code reviews, weekly dependency audits, monthly SEO reports. The trigger is time, the scope is bounded, and the output is a deliverable at each tick.

Hook loops are event-driven. A PR is pushed, a CI step fails, a file changes, an API webhook fires. The loop wakes up, does its work, and returns to sleep. Hook loops are the most reactive and the most composable with existing CI/CD pipelines.

Goal loops are the most powerful and the most dangerous. They iterate until a success condition is met, then terminate. You give the agent a goal — "ship this feature with passing tests" — and the loop reasons, plans, acts, observes, and retries until it can verify the goal is complete. Duration can stretch from minutes to hours.

The Five-Stage Agent Cycle

Inside every loop, the agent runs the same five-stage internal cycle:

Perceive — intake the current goal, tool results, and any errors from the previous iteration
Reason — analyze what is needed and what tools are available
Plan — select the next action or set of parallel actions
Act — execute tools, write code, query databases, or call APIs
Observe — receive results, update internal state, and evaluate progress against the goal

This cycle runs repeatedly until a stopping condition is met: goal verified, iteration cap hit, budget exceeded, or a circuit breaker fires.

The Five Essential Components

Building a reliable loop requires five structural elements working together:

1. Worktrees

Isolated git environments where the agent can make changes without touching the main branch. If the agent breaks something, git worktree remove discards the damage. Without worktrees, a goal loop that runs for an hour of failed attempts leaves the repository in an unknown state.

2. Skills

Reusable, version-controlled instruction files (typically SKILL.md) that define how the agent should handle a specific task type. Skills replace prompt repetition: instead of re-explaining "how to write a blog post" in every trigger message, you point the agent at the skill file. Skills also create team memory — improvements in one agent's behavior propagate to all agents using the same skill.

3. MCP Connectors

Model Context Protocol integrations give the agent access to external tools: databases, file systems, APIs, search engines, CI systems. MCP connectors are the agent's hands — without them, the loop can only reason; it cannot act on the outside world.

4. Subagents

Specialized sub-loops with their own isolated context windows. A parent loop delegates subtasks to subagents to avoid context overflow in long-running sessions. Each subagent can use a different model tier — cheap models for classification, frontier models for final review — enabling significant cost optimization.

5. State Tracking

File-based or database checkpoints that record what has been completed. Without state tracking, a loop that crashes mid-run restarts from scratch, wasting tokens and potentially repeating irreversible side effects. State tracking enables safe resume.

Cost Reality and Model Routing

A single agentic loop consumes roughly four times the tokens of a standard chat interaction. Multi-agent systems consume fifteen times more. At scale, this is the primary engineering constraint.

The solution is model routing: directing each task tier to the appropriate model size.

Task Tier	Example	Cost per 1M tokens
Classification	"Is this a bug or a feature?"	$0.10–$0.30
Drafting	"Write the implementation"	$1–$3
Final review	"Verify correctness and security"	$10–$15

Combined with prompt caching, teams applying model routing report 60 to 80 percent reduction in total inference cost compared to routing every step through a frontier model.

A Minimal Goal Loop in TypeScript

Here is a skeleton for a goal loop with state tracking and a hard iteration cap:

import { runAgent } from "./agent";
import { readState, writeState } from "./state";
import { verifyGoal } from "./verifier";
 
const MAX_ITERATIONS = 10;
 
async function goalLoop(goal: string): Promise<void> {
  const state = readState() ?? { iteration: 0, history: [] };
 
  while (state.iteration < MAX_ITERATIONS) {
    const result = await runAgent(goal, state.history);
    state.history.push(result);
    state.iteration++;
    writeState(state);
 
    if (await verifyGoal(goal, result)) {
      console.log(`Goal achieved in ${state.iteration} iterations.`);
      return;
    }
  }
 
  console.error("Goal loop hit iteration cap without verifying success.");
}

Key design decisions: the state is written after every iteration (crash-safe), the verifier is a separate function (can be upgraded independently), and the cap is explicit (no runaway spend).

Production Guardrails

Loop engineering without guardrails is expensive at best and destructive at worst. Non-optional controls for any production loop:

Hard iteration cap — the loop must stop even if the goal is not met
Token and cost budget — reject the run if projected spend exceeds a threshold
No-progress detection — if the last N iterations produced identical results, terminate
Circuit breakers on tool retries — exponential backoff plus a max retry limit per tool call
Human checkpoints — for irreversible actions (deploys, emails, database writes), pause and require confirmation before proceeding

Where to Start

The evolution of agentic patterns runs from AutoGPT-style one-shot agents in 2023, through academic ReAct and Reflexion patterns in 2024, to practitioner-grade goal loops and slash-command loops (/goal, /loop) in 2025–2026.

For new teams, start with ReAct — it handles 80 percent of production use cases and is the most debuggable. Add Reflexion-style self-correction when accuracy matters more than speed. Graduate to full goal loops for open-ended engineering tasks where you can define a verifiable exit condition.

The most important mindset shift is not technical: it is recognizing that the prompt is now an artifact in a pipeline, not a message in a chat. Design the pipeline. The agent will handle the prompts.

Conclusion

Loop engineering marks a genuine architectural shift in how developers integrate AI. Where prompt engineering asked "how do I write a better instruction?", loop engineering asks "how do I design a system that runs and self-corrects without me?" The teams answering that second question are the ones watching their commit counts compound — and their engineer-hours scale.