LLMs are good at writing code. They are terrible at running it safely. If you let a model eval() arbitrary Python on your server, you have effectively handed it a shell. The fix the industry settled on in 2025 was the secure cloud sandbox, and the most popular one is E2B Code Interpreter — a Firecracker microVM that boots in under 200 ms, comes pre-installed with Python, Node.js, Jupyter, pandas, matplotlib, and a thousand other packages, and tears itself down when you're done.

In this tutorial we will wire E2B into a Next.js 15 application using the Vercel AI SDK and Claude (or any other tool-calling model) and ship a working "code interpreter" experience: a chat where the model can plot CSVs, run regressions, and return images and files in real time.

What You'll Build

A Next.js 15 app where users can:

Upload a CSV file
Ask natural-language questions like "What is the correlation between revenue and headcount?"
Watch the agent write Python, run it inside an isolated sandbox, and stream back text, charts, and downloadable artefacts

By the end you'll understand the full loop: prompt → tool call → sandbox execution → result rendering → UI update.

Prerequisites

Node.js 20 or higher (Next.js 15 requirement)
A package manager (pnpm recommended)
An E2B API key from e2b.dev (free tier covers this tutorial)
An Anthropic API key (or OpenAI, Google, Groq — any tool-calling provider works)
Basic familiarity with React Server Components and the App Router

Step 1: Project Setup

Bootstrap a fresh Next.js 15 project with TypeScript and Tailwind.

pnpm create next-app@latest e2b-agent --typescript --tailwind --app --eslint
cd e2b-agent

Install the AI SDK, the Anthropic provider, and the E2B Code Interpreter SDK.

pnpm add ai @ai-sdk/anthropic @ai-sdk/react @e2b/code-interpreter zod

Three packages matter here:

@e2b/code-interpreter — the official SDK that spins up sandboxes and runs code
ai and @ai-sdk/anthropic — Vercel AI SDK plus the Claude provider
@ai-sdk/react — the useChat hook that handles streaming on the client

Step 2: Configure Environment Variables

Create .env.local in the project root.

E2B_API_KEY=e2b_xxxxxxxxxxxxxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxx

Never commit this file. Add it to .gitignore if your scaffold did not already.

Step 3: Build the Sandbox Helper

Create lib/sandbox.ts. This module owns sandbox lifecycle so the rest of the app stays clean.

import { Sandbox } from "@e2b/code-interpreter";
 
const TEMPLATE = "code-interpreter-v1";
const TIMEOUT_MS = 5 * 60 * 1000;
 
export async function createSandbox() {
  const sandbox = await Sandbox.create(TEMPLATE, {
    timeoutMs: TIMEOUT_MS,
  });
  return sandbox;
}
 
export async function runPython(sandboxId: string, code: string) {
  const sandbox = await Sandbox.connect(sandboxId);
  const execution = await sandbox.runCode(code, { language: "python" });
 
  return {
    stdout: execution.logs.stdout.join(""),
    stderr: execution.logs.stderr.join(""),
    results: execution.results.map((r) => ({
      text: r.text,
      png: r.png,
      html: r.html,
      json: r.json,
    })),
    error: execution.error?.value,
  };
}

A few things to notice. Sandbox.create returns a hot microVM in around 150 ms. runCode blocks until execution finishes and gives you typed results — stdout, stderr, rich outputs (matplotlib PNGs, pandas HTML tables), and structured errors. We persist sandboxId so subsequent tool calls in the same conversation reuse the same VM and keep their variables.

Step 4: Define the Tool Schema

Vercel AI SDK uses Zod to describe tools. Create lib/tools.ts.

import { tool } from "ai";
import { z } from "zod";
import { runPython, createSandbox } from "./sandbox";
 
export function buildTools(sandboxIdRef: { current: string | null }) {
  return {
    execute_python: tool({
      description:
        "Execute Python code in a secure sandbox. Use this for data analysis, calculations, plotting charts, and file manipulation. Variables persist across calls within the same conversation.",
      parameters: z.object({
        code: z
          .string()
          .describe("Valid Python code. Use matplotlib for plots."),
      }),
      execute: async ({ code }) => {
        if (!sandboxIdRef.current) {
          const sandbox = await createSandbox();
          sandboxIdRef.current = sandbox.sandboxId;
        }
        return runPython(sandboxIdRef.current, code);
      },
    }),
  };
}

sandboxIdRef is a tiny shared box passed by reference so the first tool call creates the sandbox and every subsequent call reuses it. This is the trick that makes multi-step reasoning work — the agent can define a variable in step one and read it in step three.

Step 5: Create the Chat Route

In Next.js 15 App Router, server-side streaming lives in a route handler. Create app/api/chat/route.ts.

import { anthropic } from "@ai-sdk/anthropic";
import { streamText, convertToCoreMessages } from "ai";
import { buildTools } from "@/lib/tools";
 
export const maxDuration = 60;
 
export async function POST(req: Request) {
  const { messages, sandboxId } = await req.json();
  const sandboxIdRef = { current: sandboxId ?? null };
 
  const result = streamText({
    model: anthropic("claude-sonnet-4-6"),
    system:
      "You are a senior data analyst. When the user asks anything that requires computation, plotting, or file inspection, write and run Python in the sandbox rather than guessing. Always show your work briefly before calling the tool.",
    messages: convertToCoreMessages(messages),
    tools: buildTools(sandboxIdRef),
    maxSteps: 5,
    onFinish: ({ response }) => {
      response.headers = {
        ...(response.headers ?? {}),
        "x-sandbox-id": sandboxIdRef.current ?? "",
      };
    },
  });
 
  return result.toDataStreamResponse({
    headers: {
      "x-sandbox-id": sandboxIdRef.current ?? "",
    },
  });
}

Three pieces matter. maxSteps: 5 lets the model think, run code, read the result, and run more code — without that, the agent would call the tool once and stop. The system prompt explicitly tells the model to prefer execution over guessing, which is the whole point of code interpreters. The x-sandbox-id response header is how we ship the sandbox id back to the client so it can be reattached on the next turn.

Step 6: Build the Chat UI

Create app/page.tsx.

"use client";
 
import { useChat } from "@ai-sdk/react";
import { useState } from "react";
 
export default function Home() {
  const [sandboxId, setSandboxId] = useState<string | null>(null);
 
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: "/api/chat",
      body: { sandboxId },
      onResponse: (response) => {
        const id = response.headers.get("x-sandbox-id");
        if (id) setSandboxId(id);
      },
    });
 
  return (
    <div className="mx-auto flex h-screen max-w-3xl flex-col p-6">
      <h1 className="mb-4 text-2xl font-semibold">E2B Code Interpreter</h1>
 
      <div className="flex-1 space-y-4 overflow-y-auto">
        {messages.map((m) => (
          <Message key={m.id} message={m} />
        ))}
      </div>
 
      <form onSubmit={handleSubmit} className="mt-4 flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask the agent to analyse data..."
          className="flex-1 rounded border border-zinc-300 px-3 py-2"
          disabled={isLoading}
        />
        <button
          type="submit"
          className="rounded bg-black px-4 py-2 text-white disabled:opacity-50"
          disabled={isLoading}
        >
          Send
        </button>
      </form>
    </div>
  );
}

The useChat hook does the heavy lifting — token streaming, message state, and the body merge that ships our sandboxId with every request. The onResponse callback pulls the new sandbox id out of the response headers.

Step 7: Render Tool Calls and Outputs

Add a Message component beneath the Home export. This is where you bring tool results to life.

function Message({ message }: { message: any }) {
  return (
    <div className="rounded-lg border border-zinc-200 p-4">
      <div className="mb-2 text-xs font-medium uppercase text-zinc-500">
        {message.role}
      </div>
 
      {message.parts?.map((part: any, i: number) => {
        if (part.type === "text") {
          return (
            <p key={i} className="whitespace-pre-wrap text-sm">
              {part.text}
            </p>
          );
        }
 
        if (part.type === "tool-invocation") {
          const { toolName, state, args, result } = part.toolInvocation;
          return (
            <div key={i} className="my-2 rounded bg-zinc-50 p-3 text-xs">
              <div className="mb-1 font-mono text-zinc-600">
                tool: {toolName} ({state})
              </div>
              <pre className="overflow-x-auto whitespace-pre-wrap text-zinc-800">
                {args?.code}
              </pre>
              {result?.stdout && (
                <pre className="mt-2 border-t border-zinc-200 pt-2 text-zinc-700">
                  {result.stdout}
                </pre>
              )}
              {result?.results?.map((r: any, j: number) =>
                r.png ? (
                  <img
                    key={j}
                    src={`data:image/png;base64,${r.png}`}
                    alt="plot"
                    className="mt-2 rounded border border-zinc-200"
                  />
                ) : null,
              )}
            </div>
          );
        }
        return null;
      })}
    </div>
  );
}

The parts array is the AI SDK's structured representation of the assistant turn. Text parts render as paragraphs; tool-invocation parts render the code that was executed, the stdout, and inline PNGs returned by matplotlib. No download step, no separate file server — the bytes ship in the response.

Step 8: Add File Upload

For real analysis the user needs to upload data. Create app/api/upload/route.ts.

import { Sandbox } from "@e2b/code-interpreter";
import { createSandbox } from "@/lib/sandbox";
 
export async function POST(req: Request) {
  const formData = await req.formData();
  const file = formData.get("file") as File;
  let sandboxId = formData.get("sandboxId") as string | null;
 
  if (!sandboxId) {
    const sandbox = await createSandbox();
    sandboxId = sandbox.sandboxId;
  }
 
  const sandbox = await Sandbox.connect(sandboxId);
  const buffer = Buffer.from(await file.arrayBuffer());
  const path = `/home/user/${file.name}`;
  await sandbox.files.write(path, buffer);
 
  return Response.json({ sandboxId, path });
}

Then wire an upload button in page.tsx. Send the resulting path as a hint in the user's next message — "I uploaded sales.csv at /home/user/sales.csv, can you load it with pandas?" — and the model will pick it up.

Step 9: Test It

Run the dev server.

pnpm dev

Open http://localhost:3000 and try these prompts:

"Generate 1000 random points from a normal distribution and plot a histogram."
"Calculate the first 20 Fibonacci numbers and show them as a line chart."
"Define a function that returns the nth prime. Use it to find the 100th prime."

You should see the assistant write Python, the tool box render the code in a grey panel, the stdout stream in beneath it, and any matplotlib plot show up as an inline image. Variables persist — the third turn can reference what the first turn defined.

Step 10: Production Hardening

A few things to do before shipping this beyond a demo.

Per-user sandbox isolation. Today the sandbox id lives in the client. A motivated user could send someone else's id. Store it server-side keyed by session, and look it up from the route handler.

Timeouts and cleanup. E2B sandboxes auto-terminate after the timeoutMs you set on creation, but you should also call sandbox.kill() from a cleanup webhook when a user closes their tab — paid plans bill per second of sandbox runtime.

Resource caps. Use E2B's resource configuration to cap CPU and memory per sandbox so a runaway model cannot drain your credits.

Streaming token usage. The AI SDK exposes usage on onFinish. Log it to your analytics so you can attribute cost per user.

Multi-language. The same sandbox supports JavaScript, TypeScript, R, and bash. Add more tools (execute_javascript, execute_bash) with matching Zod schemas to let the agent pick the right runtime.

Troubleshooting

Tool calls never happen. Check that maxSteps is at least 2. With maxSteps: 1 the model will plan but not execute.

Sandbox cold-start feels slow. First sandbox per region takes 600 to 800 ms. Subsequent ones in the same region land under 200 ms. For latency-sensitive apps, pre-warm one sandbox per active session.

Matplotlib plots come back blank. The default backend in E2B is non-interactive. Add plt.show() at the end of your plotting code — the SDK detects the figure and serialises it to PNG automatically.

Variables disappear between turns. You forgot to thread sandboxId through. Confirm the response header is being read on the client and pushed back in the next request body.

Next Steps

Combine this with Mastra agents for multi-agent workflows where one agent plans and another executes
Pair it with Langfuse to trace every code execution end-to-end
Build a agentic RAG pipeline where retrieved documents are analysed by the code interpreter instead of just stuffed into the prompt
Replace Claude with DeepSeek, GPT-4.1, or Gemini — the only line that changes is the model import

Conclusion

E2B Code Interpreter turns the most dangerous capability of a language model — arbitrary code execution — into one of the safest. With under 200 lines of glue code you have an agent that can analyse spreadsheets, run statistical tests, plot charts, and hand back artefacts, all inside a disposable Firecracker microVM that disappears when the conversation ends. The same pattern extends to data engineering, scientific computing, financial modelling, and any domain where "the model writes code, the sandbox runs it" beats "the model guesses the answer."

Ship this, instrument it, and your agents stop hallucinating numbers.