The browser is the new API. Stagehand turns Playwright into an AI-native framework: instead of brittle CSS selectors, you write natural-language instructions like act("click the sign-in button") and extract("the price as a number"). In this tutorial, we will build a production agent that scrapes, fills forms, and runs reliably in the cloud on Browserbase.
What You Will Build
A TypeScript agent that:
- Launches a real cloud browser via Browserbase.
- Navigates to a product page and extracts structured data with Zod.
- Performs a multi-step task (search, filter, paginate) using natural-language actions.
- Observes possible actions before deciding what to do.
- Logs every step with full session replay for debugging.
By the end, you will know when to use act, extract, observe, and raw Playwright, and how to keep an agent reliable across thousands of runs.
Prerequisites
Before starting, ensure you have:
- Node.js 20+ and pnpm or npm
- A Browserbase account (free tier works) with API key and project ID
- An OpenAI or Anthropic API key (Stagehand supports both)
- Basic familiarity with TypeScript and async/await
- A code editor (VS Code recommended)
You do not need to know Playwright in depth. Stagehand wraps it, but the AI primitives carry most of the work.
Why Stagehand and Not Plain Playwright?
Plain Playwright works great until the DOM moves. A class renames itself, an A/B test reorders buttons, and your script breaks at 3 a.m.
Stagehand replaces the fragile parts with three AI primitives:
| Primitive | What it does | When to use it |
|---|---|---|
act(instruction) | Performs an action described in plain English | Clicks, typing, navigating UI |
extract(instruction, schema) | Pulls structured data, validated by Zod | Scraping prices, lists, tables |
observe(instruction) | Returns the candidate actions without executing them | Planning, dry runs, agent loops |
Everything else you can still do with raw page.goto, page.waitForSelector, etc. The hybrid approach keeps the deterministic parts cheap and the fuzzy parts robust.
Step 1: Project Setup
Create a new project and install Stagehand alongside Zod for schema validation.
mkdir stagehand-agent && cd stagehand-agent
pnpm init
pnpm add @browserbasehq/stagehand zod
pnpm add -D typescript tsx @types/node
npx tsc --initOpen tsconfig.json and make sure these are set so top-level await works:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "Bundler",
"strict": true,
"esModuleInterop": true
}
}Add "type": "module" to your package.json so Node treats files as ESM.
Step 2: Configure Browserbase and the LLM
Create a .env file. Never commit it.
BROWSERBASE_API_KEY=bb_xxxxxxxxxxxxxxxxxxxxxxx
BROWSERBASE_PROJECT_ID=prj_xxxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxThen create src/client.ts to centralize the Stagehand instance:
import { Stagehand } from "@browserbasehq/stagehand";
export function createStagehand() {
return new Stagehand({
env: "BROWSERBASE",
apiKey: process.env.BROWSERBASE_API_KEY,
projectId: process.env.BROWSERBASE_PROJECT_ID,
modelName: "gpt-4o-mini",
modelClientOptions: {
apiKey: process.env.OPENAI_API_KEY,
},
verbose: 1,
});
}A few notes on the config:
env: "BROWSERBASE"runs in the cloud. Switch to"LOCAL"for local Chromium during development.modelNameaccepts any model your provider supports. For most flows,gpt-4o-miniis the sweet spot between cost and accuracy. Usegpt-4oorclaude-sonnet-4when extraction quality matters.verbose: 1prints reasoning steps. Set it to2while debugging.
Step 3: Your First Natural-Language Action
Let us start with the canonical hello-world: search Google for "Stagehand Browserbase" and read the first result.
Create src/01-act.ts:
import "dotenv/config";
import { createStagehand } from "./client.js";
async function main() {
const stagehand = createStagehand();
await stagehand.init();
const page = stagehand.page;
await page.goto("https://www.google.com");
await page.act("accept cookies if a banner is showing");
await page.act("type 'Stagehand Browserbase' into the search box and press Enter");
await page.act("click the first organic result");
console.log("Final URL:", page.url());
await stagehand.close();
}
main();Run it:
npx tsx src/01-act.tsTwo things will happen. First, a Browserbase session opens (you can watch it live from the dashboard). Second, Stagehand turns each English instruction into a small Playwright plan and executes it.
Notice we never wrote a single selector.
Step 4: Structured Extraction with Zod
Extraction is where Stagehand shines for scraping. Define what you want with Zod and the agent figures out the rest.
Create src/02-extract.ts:
import "dotenv/config";
import { z } from "zod";
import { createStagehand } from "./client.js";
const ProductSchema = z.object({
title: z.string(),
priceUsd: z.number().describe("Price in US dollars, numeric only"),
rating: z.number().min(0).max(5).optional(),
inStock: z.boolean(),
bulletPoints: z.array(z.string()).max(8),
});
async function main() {
const stagehand = createStagehand();
await stagehand.init();
const page = stagehand.page;
await page.goto("https://www.example-shop.com/products/widget-pro");
const product = await page.extract({
instruction: "Extract the product title, price, rating, stock status, and the bullet points under 'About this item'",
schema: ProductSchema,
});
console.log(product);
await stagehand.close();
}
main();The .describe() calls on each Zod field are not decoration. The LLM reads them and uses them as instructions. Treat them as part of your prompt.
If extraction returns garbage, the cause is almost always one of three things:
- The schema is too loose. Add
.describe()hints. - The instruction is too vague. Anchor it to a visible region of the page.
- The model is too small. Promote
gpt-4o-minitogpt-4ofor that specific call.
Step 5: Observe Before You Act
observe returns the actions the model thinks are available, without performing them. This is how you build agent loops without runaway costs or destructive mistakes.
Create src/03-observe.ts:
import "dotenv/config";
import { createStagehand } from "./client.js";
async function main() {
const stagehand = createStagehand();
await stagehand.init();
const page = stagehand.page;
await page.goto("https://news.ycombinator.com");
const candidates = await page.observe({
instruction: "Find all clickable links to story comment threads on this page",
});
console.log(`Found ${candidates.length} candidate actions`);
for (const c of candidates.slice(0, 5)) {
console.log("-", c.description);
}
await page.act(candidates[0]);
console.log("Navigated to:", page.url());
await stagehand.close();
}
main();Two reasons this matters in production:
- Idempotency. Pass the resolved
ObserveResultdirectly toactand the second run uses the cached locator, skipping a redundant LLM call. - Safety. You can inspect the proposed action, log it, even require human confirmation, before anything mutates the page.
Step 6: Build a Multi-Step Agent
Now let us combine everything into a small agent that searches a job board, filters by remote, and extracts the first ten roles.
Create src/04-agent.ts:
import "dotenv/config";
import { z } from "zod";
import { createStagehand } from "./client.js";
const JobsSchema = z.object({
jobs: z.array(
z.object({
title: z.string(),
company: z.string(),
location: z.string(),
url: z.string().url(),
})
).max(10),
});
async function main() {
const stagehand = createStagehand();
await stagehand.init();
const page = stagehand.page;
await page.goto("https://example-jobs.dev");
await page.act("search for 'typescript' in the main search field and submit");
await page.act("apply the 'Remote' filter from the location facet");
await page.act("sort results by most recent");
await page.waitForLoadState("networkidle");
const { jobs } = await page.extract({
instruction: "Extract the first 10 visible job listings",
schema: JobsSchema,
});
console.table(jobs);
await stagehand.close();
}
main();A few production patterns worth highlighting:
- Mix raw Playwright with AI calls.
waitForLoadState("networkidle")is deterministic and free. Use it. - Limit array size. The
.max(10)on the schema prevents the model from returning hundreds of rows and blowing your context budget. - Keep instructions atomic. One
actper logical action. Chained instructions are harder for the model to plan reliably.
Step 7: Persisting Sessions and Auth
Real workflows need authentication. Browserbase supports persistent contexts so you log in once and reuse the session across runs.
import { Stagehand } from "@browserbasehq/stagehand";
const stagehand = new Stagehand({
env: "BROWSERBASE",
apiKey: process.env.BROWSERBASE_API_KEY,
projectId: process.env.BROWSERBASE_PROJECT_ID,
browserbaseSessionCreateParams: {
projectId: process.env.BROWSERBASE_PROJECT_ID!,
browserSettings: {
context: {
id: "ctx_persistent_login",
persist: true,
},
},
},
});On the first run, perform the login flow manually inside act calls. On subsequent runs, cookies and local storage are restored automatically. Treat the context id like a secret because it grants resumed access to whatever sites you authenticated.
Step 8: Observability and Session Replay
Every Browserbase session records video, network logs, and a DOM snapshot timeline. When something fails, the URL in the dashboard tells you exactly where the agent got confused.
Add a tiny helper that prints the replay URL on every run:
const session = await stagehand.context.browser?.sessionId;
console.log(`Replay: https://www.browserbase.com/sessions/${session}`);For multi-step agents, also log the observe and extract outputs to a file. The combination of LLM reasoning plus video replay collapses debugging time from hours to minutes.
Step 9: Cost and Reliability Tips
A few rules we learned the hard way running agents at scale:
- Cache
observeresults. Re-using a resolved locator costs zero LLM tokens. - Use a small model for actions, a larger model for extraction. You can override
modelNameper call. - Set timeouts everywhere. Every
act,extract, and Playwright call should have an upper bound. - Prefer extraction over screenshots. Image inputs are expensive and slow. Text extraction with a good schema is faster and more reliable.
- Run jobs in parallel via Browserbase sessions. They are isolated by default, so you can fan out to dozens of concurrent agents without selector collisions.
Testing Your Agent
Stagehand pairs well with Vitest. The recipe we use:
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { createStagehand } from "../src/client.js";
let stagehand: ReturnType<typeof createStagehand>;
beforeAll(async () => {
stagehand = createStagehand();
await stagehand.init();
});
afterAll(async () => {
await stagehand.close();
});
describe("product extraction", () => {
it("extracts a price as a number", async () => {
await stagehand.page.goto("https://example-shop.com/products/widget");
const { priceUsd } = await stagehand.page.extract({
instruction: "Extract the visible product price in USD",
schema: z.object({ priceUsd: z.number() }),
});
expect(priceUsd).toBeGreaterThan(0);
});
});Run with vitest --no-file-parallelism so multiple tests do not stomp on the same Browserbase session.
Troubleshooting
The agent clicks the wrong element. Tighten the instruction. "Click the primary CTA in the hero section" beats "click the button".
Extraction returns empty fields. Check that the data is in the DOM, not lazy-loaded behind an intersection observer. Add await page.waitForLoadState("networkidle") or scroll the section into view first.
Hitting rate limits. Use a smaller model for act, batch extract calls into a single instruction with a richer schema, and cache observe results.
Captchas. Browserbase has built-in solving. Enable it via browserSettings.solveCaptchas: true when creating the session.
Next Steps
- Explore the Stagehand documentation for the full API surface
- Read our AI Web Scraper with Playwright tutorial to compare approaches
- Combine Stagehand with Trigger.dev to run agents on a schedule
- Wire it into Mastra for a full multi-agent system
Conclusion
You now have a TypeScript agent that browses the web like a person, extracts data like an API, and is debuggable like a unit test. The mental model is simple: deterministic Playwright for the parts you control, act and extract for the parts that change.
The browser is no longer a hostile interface for automation. With Stagehand and Browserbase, it is just another tool your code can reach for, with the resilience of an LLM and the speed of headless Chromium.