Vercel AI SDK 7: Durable Agents & Voice Guide

The TypeScript AI stack just moved a step closer to production. Vercel AI SDK 7 is now generally available, and unlike the steady iterations of the 6.x line, this release is a deliberate foundation for agents and AI platforms that run in production — with durability, human-in-the-loop approvals, normalized telemetry, and a now-stable voice layer.

If you built on SDK 6, you do not have to throw anything away. But the new primitives are worth understanding, because they replace patterns that most teams previously hand-rolled: resuming a long-running agent after a deploy, pausing for a human to approve a risky tool call, and getting consistent traces across every model provider.

This guide walks through what is actually new, with code, and how to migrate.

What changed since SDK 6

SDK 6 made generateText, streamText, and tool-calling loops ergonomic. SDK 7 keeps those APIs and layers production concerns on top:

Durable, resumable agents that survive restarts and redeploys
Tool approvals with HMAC-signed, replay-protected human-in-the-loop gates
Unified telemetry registered once, covering every SDK function
Stable speech and transcription plus experimental realtime voice
Granular timeouts, tool context scoping, and a sandbox abstraction

Install or upgrade with your package manager of choice:

pnpm add ai@latest
# or
npm install ai@latest

Most existing code keeps working, and an official codemod handles the mechanical parts of the migration (covered at the end).

Durable agents that survive a deploy

The headline feature is durable execution. In SDK 6, a multi-step agent lived entirely in memory: if the process restarted mid-run — a deploy, a crash, an autoscale event — the run was lost. SDK 7 introduces a WorkflowAgent whose state is checkpointed per step and can resume from exactly where it stopped.

import { WorkflowAgent } from 'ai';
import { openai } from '@ai-sdk/openai';
 
const agent = new WorkflowAgent({
  model: openai('gpt-5.6'),
  instructions: 'Research the topic and produce a cited summary.',
  tools: { search, fetchPage },
  timeout: {
    totalMs: 60000,
    stepMs: 10000,
    chunkMs: 2000,
    toolMs: 5000,
  },
});
 
const result = await agent.run({ prompt: 'Summarize AI SDK 7 changes' });

Each step records its number, duration, and success or failure, so a resumed run does not repeat completed work. The timeout object is itself new: you can bound the total run, each step, each streaming chunk, and each individual tool call independently — which matters when one slow tool should not be allowed to hang a whole agent.

Human-in-the-loop tool approvals

Autonomous agents that can spend money, send emails, or delete records need a brake. SDK 7 builds approvals directly into the tool layer. A tool can require user-approval, or you can supply a custom approval function that decides per call.

import { tool } from 'ai';
import { z } from 'zod';
 
const refundTool = tool({
  description: 'Issue a customer refund',
  inputSchema: z.object({ orderId: z.string(), amount: z.number() }),
  needsApproval: 'user-approval',
  execute: async ({ orderId, amount }) => issueRefund(orderId, amount),
});

When the model calls a gated tool, the agent pauses and emits an approval request instead of executing. Approvals are HMAC-signed, and inputs are revalidated on resume, which prevents a forged or stale approval from being replayed against a different payload. That replay protection is the difference between a demo and something you can put in front of customers.

Tool context: stop leaking your API keys

A subtle but important addition is tool context scoping. Previously, tools often closed over secrets from the surrounding module scope. SDK 7 lets you declare a contextSchema so each tool receives typed, isolated configuration — keys and per-request settings stay out of the model-visible surface and out of other tools.

const agent = new ToolLoopAgent({
  model: openai('gpt-5.6'),
  contextSchema: z.object({ apiKey: z.string(), userId: z.string() }),
  tools: { crmLookup },
});

Inside prepareStep(), that runtime context is available as typed variables, so you can vary behavior per request without threading globals through your code. For multi-tenant apps, this is the cleanest way to keep one user's credentials from ever reaching another user's tool execution.

Voice goes stable: speech and transcription

The 6.x line treated audio as experimental. SDK 7 promotes text-to-speech and transcription to first-class, provider-agnostic functions.

import { generateSpeech, transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
 
// Text to speech
const audio = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Welcome to the Noqta developer blog.',
  voice: 'alloy',
});
 
// Speech to text
const { text } = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: audioBuffer,
});

Because the model is just a parameter, switching providers — OpenAI, LMNT, or another speech vendor — is a one-line change rather than a rewrite. For multilingual products, including Arabic and French interfaces common across the MENA region, that portability means you can route different languages to whichever provider transcribes them best without rebuilding your pipeline.

Realtime voice and client-driven tools

Beyond batch speech, SDK 7 adds experimental_useRealtime() — provider-agnostic realtime support over direct browser WebSocket sessions, with audio transcription and client-driven tool calling. This is the primitive behind full-duplex voice agents: the user can speak and be spoken to at the same time, and the model can invoke tools mid-conversation without binding your UI to one provider's event format. The AI Gateway exposes the same capability server-side through a normalized realtime session, so you keep one interface across vendors.

Telemetry you register once

Observability in SDK 6 meant wiring experimental_telemetry into each call. SDK 7 flips this: register telemetry once and it covers every AI SDK function globally.

import { registerTelemetry } from 'ai';
 
registerTelemetry({
  // OpenTelemetry with GenAI semantic conventions
  serviceName: 'noqta-agents',
});

It speaks OpenTelemetry with the GenAI semantic conventions, so traces flow into Datadog, Langfuse, Braintrust, Sentry, Langsmith, and others without per-call boilerplate. Every function now exposes a performance object with responseTimeMs, timeToFirstOutputMs, and outputTokensPerSecond, plus consistent onStart and onEnd lifecycle callbacks that fire identically across the SDK. For anyone who has tried to compare latency across providers, having one consistent metric shape is a quiet but real upgrade.

Migrating from SDK 6

The migration is intentionally low-effort. Run the official codemod and it rewrites the mechanical changes for you:

npx @ai-sdk/codemod v7

There is also a skill-based path if you use an agentic editor:

npx skills add vercel/ai --skill migrate-ai-sdk-v6-to-v7

The architectural shifts to keep in mind: APIs are more provider-agnostic, tool context scoping changes how you pass secrets, and the durability layer is opt-in through WorkflowAgent. Your existing generateText and streamText calls largely stay as-is. Read Vercel's AI SDK 7 announcement and the speech docs for the full reference before adopting durability in production.

Should you upgrade?

If you are prototyping, SDK 7's ergonomics alone justify the move. If you are shipping agents to real users, the durability, approvals, and unified telemetry close exactly the gaps that previously forced teams to build bespoke infrastructure around the SDK. The voice layer is a bonus that puts conversational agents within reach of a few lines of code.

For deeper dives on building production agents in TypeScript, see our guides on Vercel AI SDK 6, durable execution for AI agents, and LLM observability in production.

Building AI-powered products for the MENA market? Noqta helps teams design, build, and ship production-grade AI applications. Get in touch.