writing/tutorial/2026/05
TutorialMay 19, 2026·30 min read

Build AI Browser Agents with Stagehand by Browserbase in 2026

Learn how to build production-grade AI browser agents using Stagehand and Browserbase. This complete tutorial covers natural-language web automation, structured data extraction, observability, and deploying agents to the cloud with TypeScript.

The browser is the new API. Stagehand turns Playwright into an AI-native framework: instead of brittle CSS selectors, you write natural-language instructions like act("click the sign-in button") and extract("the price as a number"). In this tutorial, we will build a production agent that scrapes, fills forms, and runs reliably in the cloud on Browserbase.

What You Will Build

A TypeScript agent that:

  1. Launches a real cloud browser via Browserbase.
  2. Navigates to a product page and extracts structured data with Zod.
  3. Performs a multi-step task (search, filter, paginate) using natural-language actions.
  4. Observes possible actions before deciding what to do.
  5. Logs every step with full session replay for debugging.

By the end, you will know when to use act, extract, observe, and raw Playwright, and how to keep an agent reliable across thousands of runs.


Prerequisites

Before starting, ensure you have:

  • Node.js 20+ and pnpm or npm
  • A Browserbase account (free tier works) with API key and project ID
  • An OpenAI or Anthropic API key (Stagehand supports both)
  • Basic familiarity with TypeScript and async/await
  • A code editor (VS Code recommended)

You do not need to know Playwright in depth. Stagehand wraps it, but the AI primitives carry most of the work.


Why Stagehand and Not Plain Playwright?

Plain Playwright works great until the DOM moves. A class renames itself, an A/B test reorders buttons, and your script breaks at 3 a.m.

Stagehand replaces the fragile parts with three AI primitives:

PrimitiveWhat it doesWhen to use it
act(instruction)Performs an action described in plain EnglishClicks, typing, navigating UI
extract(instruction, schema)Pulls structured data, validated by ZodScraping prices, lists, tables
observe(instruction)Returns the candidate actions without executing themPlanning, dry runs, agent loops

Everything else you can still do with raw page.goto, page.waitForSelector, etc. The hybrid approach keeps the deterministic parts cheap and the fuzzy parts robust.


Step 1: Project Setup

Create a new project and install Stagehand alongside Zod for schema validation.

mkdir stagehand-agent && cd stagehand-agent
pnpm init
pnpm add @browserbasehq/stagehand zod
pnpm add -D typescript tsx @types/node
npx tsc --init

Open tsconfig.json and make sure these are set so top-level await works:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "Bundler",
    "strict": true,
    "esModuleInterop": true
  }
}

Add "type": "module" to your package.json so Node treats files as ESM.


Step 2: Configure Browserbase and the LLM

Create a .env file. Never commit it.

BROWSERBASE_API_KEY=bb_xxxxxxxxxxxxxxxxxxxxxxx
BROWSERBASE_PROJECT_ID=prj_xxxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx

Then create src/client.ts to centralize the Stagehand instance:

import { Stagehand } from "@browserbasehq/stagehand";
 
export function createStagehand() {
  return new Stagehand({
    env: "BROWSERBASE",
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
    modelName: "gpt-4o-mini",
    modelClientOptions: {
      apiKey: process.env.OPENAI_API_KEY,
    },
    verbose: 1,
  });
}

A few notes on the config:

  • env: "BROWSERBASE" runs in the cloud. Switch to "LOCAL" for local Chromium during development.
  • modelName accepts any model your provider supports. For most flows, gpt-4o-mini is the sweet spot between cost and accuracy. Use gpt-4o or claude-sonnet-4 when extraction quality matters.
  • verbose: 1 prints reasoning steps. Set it to 2 while debugging.

Step 3: Your First Natural-Language Action

Let us start with the canonical hello-world: search Google for "Stagehand Browserbase" and read the first result.

Create src/01-act.ts:

import "dotenv/config";
import { createStagehand } from "./client.js";
 
async function main() {
  const stagehand = createStagehand();
  await stagehand.init();
 
  const page = stagehand.page;
 
  await page.goto("https://www.google.com");
  await page.act("accept cookies if a banner is showing");
  await page.act("type 'Stagehand Browserbase' into the search box and press Enter");
  await page.act("click the first organic result");
 
  console.log("Final URL:", page.url());
 
  await stagehand.close();
}
 
main();

Run it:

npx tsx src/01-act.ts

Two things will happen. First, a Browserbase session opens (you can watch it live from the dashboard). Second, Stagehand turns each English instruction into a small Playwright plan and executes it.

Notice we never wrote a single selector.


Step 4: Structured Extraction with Zod

Extraction is where Stagehand shines for scraping. Define what you want with Zod and the agent figures out the rest.

Create src/02-extract.ts:

import "dotenv/config";
import { z } from "zod";
import { createStagehand } from "./client.js";
 
const ProductSchema = z.object({
  title: z.string(),
  priceUsd: z.number().describe("Price in US dollars, numeric only"),
  rating: z.number().min(0).max(5).optional(),
  inStock: z.boolean(),
  bulletPoints: z.array(z.string()).max(8),
});
 
async function main() {
  const stagehand = createStagehand();
  await stagehand.init();
 
  const page = stagehand.page;
  await page.goto("https://www.example-shop.com/products/widget-pro");
 
  const product = await page.extract({
    instruction: "Extract the product title, price, rating, stock status, and the bullet points under 'About this item'",
    schema: ProductSchema,
  });
 
  console.log(product);
 
  await stagehand.close();
}
 
main();

The .describe() calls on each Zod field are not decoration. The LLM reads them and uses them as instructions. Treat them as part of your prompt.

If extraction returns garbage, the cause is almost always one of three things:

  • The schema is too loose. Add .describe() hints.
  • The instruction is too vague. Anchor it to a visible region of the page.
  • The model is too small. Promote gpt-4o-mini to gpt-4o for that specific call.

Step 5: Observe Before You Act

observe returns the actions the model thinks are available, without performing them. This is how you build agent loops without runaway costs or destructive mistakes.

Create src/03-observe.ts:

import "dotenv/config";
import { createStagehand } from "./client.js";
 
async function main() {
  const stagehand = createStagehand();
  await stagehand.init();
  const page = stagehand.page;
 
  await page.goto("https://news.ycombinator.com");
 
  const candidates = await page.observe({
    instruction: "Find all clickable links to story comment threads on this page",
  });
 
  console.log(`Found ${candidates.length} candidate actions`);
  for (const c of candidates.slice(0, 5)) {
    console.log("-", c.description);
  }
 
  await page.act(candidates[0]);
  console.log("Navigated to:", page.url());
 
  await stagehand.close();
}
 
main();

Two reasons this matters in production:

  1. Idempotency. Pass the resolved ObserveResult directly to act and the second run uses the cached locator, skipping a redundant LLM call.
  2. Safety. You can inspect the proposed action, log it, even require human confirmation, before anything mutates the page.

Step 6: Build a Multi-Step Agent

Now let us combine everything into a small agent that searches a job board, filters by remote, and extracts the first ten roles.

Create src/04-agent.ts:

import "dotenv/config";
import { z } from "zod";
import { createStagehand } from "./client.js";
 
const JobsSchema = z.object({
  jobs: z.array(
    z.object({
      title: z.string(),
      company: z.string(),
      location: z.string(),
      url: z.string().url(),
    })
  ).max(10),
});
 
async function main() {
  const stagehand = createStagehand();
  await stagehand.init();
  const page = stagehand.page;
 
  await page.goto("https://example-jobs.dev");
  await page.act("search for 'typescript' in the main search field and submit");
  await page.act("apply the 'Remote' filter from the location facet");
  await page.act("sort results by most recent");
 
  await page.waitForLoadState("networkidle");
 
  const { jobs } = await page.extract({
    instruction: "Extract the first 10 visible job listings",
    schema: JobsSchema,
  });
 
  console.table(jobs);
 
  await stagehand.close();
}
 
main();

A few production patterns worth highlighting:

  • Mix raw Playwright with AI calls. waitForLoadState("networkidle") is deterministic and free. Use it.
  • Limit array size. The .max(10) on the schema prevents the model from returning hundreds of rows and blowing your context budget.
  • Keep instructions atomic. One act per logical action. Chained instructions are harder for the model to plan reliably.

Step 7: Persisting Sessions and Auth

Real workflows need authentication. Browserbase supports persistent contexts so you log in once and reuse the session across runs.

import { Stagehand } from "@browserbasehq/stagehand";
 
const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  browserbaseSessionCreateParams: {
    projectId: process.env.BROWSERBASE_PROJECT_ID!,
    browserSettings: {
      context: {
        id: "ctx_persistent_login",
        persist: true,
      },
    },
  },
});

On the first run, perform the login flow manually inside act calls. On subsequent runs, cookies and local storage are restored automatically. Treat the context id like a secret because it grants resumed access to whatever sites you authenticated.


Step 8: Observability and Session Replay

Every Browserbase session records video, network logs, and a DOM snapshot timeline. When something fails, the URL in the dashboard tells you exactly where the agent got confused.

Add a tiny helper that prints the replay URL on every run:

const session = await stagehand.context.browser?.sessionId;
console.log(`Replay: https://www.browserbase.com/sessions/${session}`);

For multi-step agents, also log the observe and extract outputs to a file. The combination of LLM reasoning plus video replay collapses debugging time from hours to minutes.


Step 9: Cost and Reliability Tips

A few rules we learned the hard way running agents at scale:

  • Cache observe results. Re-using a resolved locator costs zero LLM tokens.
  • Use a small model for actions, a larger model for extraction. You can override modelName per call.
  • Set timeouts everywhere. Every act, extract, and Playwright call should have an upper bound.
  • Prefer extraction over screenshots. Image inputs are expensive and slow. Text extraction with a good schema is faster and more reliable.
  • Run jobs in parallel via Browserbase sessions. They are isolated by default, so you can fan out to dozens of concurrent agents without selector collisions.

Testing Your Agent

Stagehand pairs well with Vitest. The recipe we use:

import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { createStagehand } from "../src/client.js";
 
let stagehand: ReturnType<typeof createStagehand>;
 
beforeAll(async () => {
  stagehand = createStagehand();
  await stagehand.init();
});
 
afterAll(async () => {
  await stagehand.close();
});
 
describe("product extraction", () => {
  it("extracts a price as a number", async () => {
    await stagehand.page.goto("https://example-shop.com/products/widget");
    const { priceUsd } = await stagehand.page.extract({
      instruction: "Extract the visible product price in USD",
      schema: z.object({ priceUsd: z.number() }),
    });
    expect(priceUsd).toBeGreaterThan(0);
  });
});

Run with vitest --no-file-parallelism so multiple tests do not stomp on the same Browserbase session.


Troubleshooting

The agent clicks the wrong element. Tighten the instruction. "Click the primary CTA in the hero section" beats "click the button".

Extraction returns empty fields. Check that the data is in the DOM, not lazy-loaded behind an intersection observer. Add await page.waitForLoadState("networkidle") or scroll the section into view first.

Hitting rate limits. Use a smaller model for act, batch extract calls into a single instruction with a richer schema, and cache observe results.

Captchas. Browserbase has built-in solving. Enable it via browserSettings.solveCaptchas: true when creating the session.


Next Steps


Conclusion

You now have a TypeScript agent that browses the web like a person, extracts data like an API, and is debuggable like a unit test. The mental model is simple: deterministic Playwright for the parts you control, act and extract for the parts that change.

The browser is no longer a hostile interface for automation. With Stagehand and Browserbase, it is just another tool your code can reach for, with the resilience of an LLM and the speed of headless Chromium.