AI Agent Sandboxes: Secure Code Execution Guide 2026

Your AI agent just wrote a Python script, and now it wants to run it. The script could install dependencies, hit an API, write files, and exit cleanly. Or it could rm -rf a mounted volume, exfiltrate your environment variables, or open a reverse shell — either because the model hallucinated something dangerous, or because a prompt injection told it to.

This is the uncomfortable reality of agentic systems in 2026: the same capability that makes AI agents useful — executing code they generate on the fly — is also the single largest attack surface in your stack. If you are building anything where an LLM produces and runs code, the question is no longer whether to sandbox, but how deeply.

This guide breaks down the isolation technologies, the platforms built on top of them, and the patterns that separate a toy demo from a production-safe deployment.

Why containers alone are not enough

The instinct for most engineers is to reach for Docker. It is familiar, it is everywhere, and it feels isolated. The problem is that standard containers share the host kernel. Every syscall your agent's code makes goes straight to the same kernel that runs your other workloads.

That shared kernel is the escape vector. A single kernel CVE — and there are new ones every year — can let untrusted code break out of the container and reach the host. For trusted internal code, that risk is acceptable. For code an LLM wrote based on text that may have been poisoned by an attacker, it is not.

The zero-trust principle for agent sandboxes is blunt: treat all LLM-generated code as potentially malicious. Not "probably fine." Malicious. Once you adopt that stance, the architecture choices follow naturally.

The three isolation tiers

There are three practical levels of isolation, each trading performance for a stronger security boundary.

Tier 1 — Standard containers (minimum viable)

Docker, containerd, and friends. Namespaces and cgroups give you process and resource isolation, but the kernel is shared. This is the floor, not the ceiling. Use it only when the code is semi-trusted or the blast radius is genuinely contained — for example, an ephemeral container with no network, no secrets, and a read-only filesystem.

Tier 2 — User-space kernels (gVisor)

gVisor sits between the workload and the host. It intercepts syscalls in user space and reimplements a large part of the kernel API itself, so untrusted code rarely talks to the real host kernel directly. You get a meaningfully stronger boundary than containers with only a modest performance cost. Modal and several other platforms use gVisor as their default isolation layer, hitting sub-second cold starts.

Tier 3 — microVMs (Firecracker, Kata Containers)

This is the gold standard. Each workload gets its own dedicated guest kernel running inside a lightweight virtual machine. A vulnerability in the guest kernel cannot reach the host kernel, because they are separate kernels with a hardware virtualization boundary between them.

Firecracker — the same technology that powers AWS Lambda — is the de facto choice. The numbers are why: cold boot in under 125 ms, memory overhead under 5 MiB per VM, and compute overhead less than 5 percent compared to bare metal. With snapshot and restore, you can provision a sandbox in roughly 28 ms by restoring a pre-warmed memory image through a copy-on-write overlay. You get VM-grade isolation at nearly container speed.

Platform comparison: who runs what

You rarely build on raw Firecracker yourself. A wave of platforms now wrap these isolation technologies in agent-friendly SDKs. Here is how the major options compare on isolation and cold-start latency — the two numbers that matter most for interactive agents.

Platform	Isolation	Cold start	Best for
E2B	Firecracker microVM	~150 ms	Agent-native code execution
Daytona	microVM	~90 ms	Fast spin-up dev environments
Modal	gVisor	sub-1s	Serverless compute + GPU
Fly.io Sprites	microVM	2-3 s	Persistent agent workspaces
Northflank	Firecracker / Kata / gVisor	varies	Pick isolation per workload

E2B stands out because it was built specifically for AI agent developers — not general compute, not CI/CD. Each sandbox gets its own kernel and network namespace, so a guest kernel vulnerability cannot reach the host, and the SDK is shaped around agent workflows rather than retrofitted from a deployment tool.

Northflank takes the opposite approach: a full developer platform that supports Firecracker, Kata Containers, Cloud Hypervisor, and gVisor, letting you choose the security-versus-performance tradeoff per workload. That flexibility matters when some of your agents run trusted internal tooling and others run arbitrary user-submitted code.

What a secure sandbox actually enforces

Isolation depth is necessary but not sufficient. A microVM with a wide-open network and your AWS keys mounted inside it is still a disaster waiting to happen. True isolation means the sandbox cannot reach host processes, host filesystems, or host network interfaces — and it means you control exactly what the sandbox can reach.

Here is a minimal E2B example that runs untrusted agent code with a tight boundary:

from e2b_code_interpreter import Sandbox
 
# Each call spins up an isolated Firecracker microVM
with Sandbox.create(timeout=30) as sandbox:
    # Code generated by the agent — treated as untrusted
    execution = sandbox.run_code(agent_generated_code)
 
    if execution.error:
        # Capture the failure, never surface raw host errors to the model
        handle_error(execution.error)
    else:
        result = execution.results
 
# Sandbox is destroyed on exit — no persistent state, no escape window

The defensive primitives you should layer on top of any sandbox:

No ambient credentials. Never mount cloud keys, database passwords, or long-lived tokens into the sandbox. If the code needs an external resource, broker it through a proxy that you control and audit.
Egress allowlists. Default-deny outbound network. Open only the specific hosts the task requires. This single control neutralizes most exfiltration attempts.
Short timeouts and hard resource caps. Bound CPU, memory, and wall-clock time. A runaway loop or a crypto-miner should die in seconds.
Ephemeral by default. Destroy the sandbox after each task. Persistent state is convenient and dangerous; reach for it only when you have a specific reason and a cleanup story.
Read-only where possible. Mount inputs read-only. Give the agent a single scratch directory it can write to.

The leak that is not about code at all

Here is the finding that reframes the whole problem. Research from Washington University found that 63.4 percent of LLM agents without proper isolation leaked sensitive data through conversation — not through code execution.

Read that again. The danger is not only that the agent runs curl evil.com. It is that a helpful agent, when an injected instruction asks it to, will happily read a file and summarize its contents back into the chat. No malicious code is executed. The model simply does what it was told, and your secret leaves through the response channel.

This means sandboxing the code runtime is only half the job. You also need to sandbox the data the agent can see and the channels it can speak through. Treat the agent's context window as part of the trust boundary:

Keep secrets out of the context entirely; reference them by handle, not by value.
Filter and classify what tools return to the model before it lands in the context.
Apply output guardrails on the agent's final response, not just on its tool calls.

A perfectly isolated microVM does nothing for you if the agent reads your customer database and types it into a reply.

Matching isolation depth to risk

Not every workload needs a microVM. The right model is to match isolation depth to your actual threat model:

Trusted, internal code with no secrets nearby → a hardened container is fine.
Semi-trusted code, limited blast radius → gVisor gives you a strong middle ground at low latency.
Arbitrary, user-influenced, or LLM-generated code with anything valuable in reach → microVMs, full stop.

When in doubt, go deeper. The cost difference between a container and a Firecracker microVM is now measured in tens of milliseconds and single-digit percentages of overhead. The cost of an escape is measured in incident reports and lost trust.

Bringing it to production

If you are deploying agentic code execution today, a sane starting architecture looks like this:

Pick a microVM-based sandbox provider (E2B for agent-native, Northflank if you want to control the isolation tier per workload).
Run every code execution ephemerally — one sandbox per task, destroyed on completion.
Default-deny network, with an explicit egress allowlist per task type.
Broker all credentials through an audited proxy; nothing sensitive lives inside the sandbox.
Guard the conversation channel as carefully as the code channel, because that is where the majority of real leaks happen.

The agentic era runs on code that no human reviewed before it executed. That is the entire value proposition — and the entire risk. Sandboxing is how you keep the value without inheriting the catastrophe.

At Noqta, we help teams across the MENA region design and deploy AI agent systems with security built in from day one — from isolation architecture to zero-trust data boundaries. If you are putting autonomous agents into production, the time to get the sandbox right is before the first incident, not after.

Sources: Firecrawl — AI Agent Sandbox, Northflank — Best Code Execution Sandbox, Spheron — E2B, Daytona, Firecracker Setup, ARMO — AI Agent Sandboxing Guide.