writing/blog/2026/05
BlogMay 22, 2026·6 min read

Codex /goal: The Ralph Loop Comes Built-In

OpenAI Codex CLI just shipped /goal, baking the Ralph Loop pattern into autonomous coding agents. Here is how plan, act, test, review, iterate now works out of the box.

For a year, a small cult of developers wrote shell scripts that wrapped Claude Code and Codex in while loops. They called the pattern "Ralph" — point an agent at a goal, let it act, test, review, and iterate until done. Cursed but effective.

This week, both Codex CLI and Claude Code shipped the same idea natively. The command is called /goal, and it turns coding agents from chat partners into self-evaluating workers that stop when the task is verifiably complete.

If you are still typing one prompt at a time and watching tokens scroll by, this is the shift you have been missing.

From dialogue to outcome

The default mental model for AI coding has been conversational: you write a prompt, the agent writes some code, you read the diff, you correct, you repeat. Andrej Karpathy has been calling this "vibe coding" — fast, intuitive, and brittle past a certain complexity.

Goal-oriented agents flip the loop. You declare the outcome and the success criteria. The agent owns the iteration:

  1. Plan — break the goal into verifiable subtasks
  2. Act — execute the next subtask (edit code, run a script, call a tool)
  3. Test — run checks against the success criteria
  4. Review — diff its own output against the goal
  5. Iterate — pick the next subtask or stop if done

That is the Ralph Loop in five lines. Codex's /goal and Claude Code's equivalent now run this loop without a babysitter.

Setting it up in Codex CLI

The command landed in Codex CLI version 0.128 and later. To turn it on, edit your ~/.codex/config.toml:

[features]
goals = true
 
[goal]
max_iterations = 40
auto_mode = true
require_tests = true

Restart the CLI, then drop a goal:

codex
> /goal Add JWT refresh-token rotation to the auth service. Existing tests in tests/auth must pass. Add a new test that verifies a rotated token invalidates the previous one.

Codex plans the work, edits files, runs the test suite, reads the failures, edits again, and stops when the success criteria are green. Shift+Tab enters auto mode so the agent never pauses for confirmation between steps.

Claude Code uses an almost identical interface. Both tools converged on the same shape within a few weeks — a strong signal that goal engineering is the new default, not a niche pattern.

Writing a goal that actually completes

The single biggest failure mode of autonomous loops is ambiguous goals. The agent will run forever, or stop too early, if "done" is fuzzy.

A goal that works has three parts:

Objective: <one sentence outcome>
Constraints: <files, libraries, performance bounds>
Success: <verifiable checks>

Compare these two:

# BAD
/goal Make the search faster.
 
# GOOD
/goal Reduce P95 latency of GET /search from 480ms to under 200ms.
Constraints: do not change the public API, do not add a new dependency.
Success: a new benchmark in tests/perf/search_bench.ts passes asserting P95 under 200ms over 1000 requests against the seeded fixture.

The good version is a closed system. The agent can write the benchmark, run it, see the number, and stop only when it is green. That is what makes the loop terminating.

Where the Ralph Loop wins

Three classes of work map cleanly onto goal-oriented agents:

  • Mechanical refactors. Rename a module, migrate from one ORM to another, swap an HTTP client across thirty files. The success criterion is "tests still pass," which the loop can verify cheaply.
  • Bug fixes with a reproducible failure. Hand the agent a failing test or a stack trace. The loop runs until red turns green.
  • Performance tuning. Define a benchmark, set a target, let the agent try, measure, and iterate. Humans are bad at this kind of brute-force search. Agents are not.

The common thread: a cheap, deterministic check separates "done" from "not done." Without that, no autonomous loop will save you time.

Where it falls over

Goal-oriented agents fail predictably on three patterns:

  • Design decisions. "Build a notification system" has too many right answers. The loop will pick one and barrel forward, often the wrong one.
  • Cross-system changes without local tests. If verifying success requires hitting a staging environment or a paid API, the agent cannot tighten the loop.
  • Flaky tests. A test that fails 1 in 5 runs will trick the agent into thinking it broke something it did not. It will then "fix" working code.

The fix for all three: keep humans in the planning step and let the agent own the execution. Write the spec, write the acceptance test, then hand off the goal.

Safety and stopping conditions

A self-evaluating loop with auto mode is, by construction, an agent that can do a lot of damage between the moment you start it and the moment you come back. Configure the boundaries before you press Enter:

[goal.safety]
allowed_paths = ["src/**", "tests/**"]
denied_commands = ["rm -rf", "git push", "npm publish"]
require_clean_git = true
max_runtime_minutes = 30
require_human_approval_for = ["migrations", "package.json"]

Treat the goal config like a CI policy. The agent is now a contributor that opens commits without you watching — give it the same blast radius you would give a junior on their first week.

What changes for engineering teams

The economics shift in a real way. A /goal that runs for 40 minutes against a $0.10-per-million-token model burns real money, and if you launch ten of them in parallel, your monthly bill will tell you. The MENA teams already using Claude Code or Codex on production codebases should budget for this — goal-oriented work is cheaper per outcome than chat-style prompting, but the absolute spend per developer goes up because more work gets attempted.

The other shift is review. If a developer ships ten goal-driven PRs in a day, reviewers become the bottleneck. The companies adapting fastest are pairing /goal with stricter PR templates, auto-generated diff summaries, and required acceptance tests inside the PR body. The agent does the writing; humans police the spec and the diff.

How to start tomorrow

Pick one task in your current sprint that has a clear verifiable outcome. Ideally a bug with a failing test, or a refactor that the existing tests already cover. Write the goal in the three-part shape above. Run it once with /goal and auto_mode = false so you can watch what the loop does. Then, on the next task, flip auto mode on and walk away.

The first time you come back to a green test suite and a clean diff that you did not type, the chat-based mental model dissolves on its own. You will not go back.

The Ralph Loop is no longer a cursed shell script. It is the default. The question is no longer whether to use it, but how well you can write a goal.