Every serious AI app eventually faces the same multi-provider headache. You start with OpenAI, add Anthropic for Claude's longer context, route some traffic to Google for Gemini Flash on the cheap, and maybe sprinkle in Mistral for open-weights fallback. Suddenly your .env has six API keys, your provider clients are scattered across files, and a single OpenAI outage takes the whole product down.

Vercel AI Gateway collapses that mess into one HTTP endpoint, one API key, and one billing relationship. You keep writing code against the Vercel AI SDK, but every model call is now load-balanced, observable, cacheable, and failover-ready by default.

In this tutorial you will build a production-grade chat application that routes intelligently across providers, falls back automatically when one is down, caches expensive responses, and tracks token spend per user — all without changing your application code beyond a single configuration switch.

Prerequisites

Before starting, make sure you have:

Node.js 20 or later installed
A Next.js 15 project (or follow the setup step below)
A Vercel account with AI Gateway enabled — sign up at vercel.com
Basic familiarity with React, TypeScript, and Next.js App Router
A code editor (VS Code recommended)

You do not need accounts with OpenAI, Anthropic, or Google to follow along. Vercel AI Gateway provides credits to test all supported providers from a single dashboard.

What You Will Build

A multi-model AI chat application that:

Routes through one endpoint — every request goes to the Gateway, never directly to a provider
Auto-fails over when a provider returns errors or hits rate limits
Caches identical prompts to cut token spend on repeated questions
Tracks usage per user with custom metadata headers
Compares models live with a side-by-side response panel for benchmarking

Why a Gateway Beats Direct Provider Calls

Calling provider SDKs directly works fine in a prototype. It breaks down the moment any of these matter:

Reliability. Provider outages happen monthly. Without failover, your users see errors instead of answers.
Cost visibility. Each provider has its own dashboard, billing schedule, and token accounting. Reconciling spend across five providers is a finance nightmare.
Rate limits. OpenAI tier 1 caps you at a few thousand requests per minute. Spreading load across providers raises your effective ceiling.
Model swapping. Trying a new model means hunting down every openai.chat() call. With a Gateway, you change one string.
Observability. You need request traces, error rates, and latency percentiles per model. Building this yourself is weeks of work.

A Gateway gives you all of this as a managed service, billed by token passthrough plus a small markup.

Step 1: Project Setup

Create a fresh Next.js 15 project with TypeScript and Tailwind:

npx create-next-app@latest ai-gateway-demo \
  --typescript --tailwind --app --use-pnpm
 
cd ai-gateway-demo

Install the Vercel AI SDK v5 and the Gateway provider package:

pnpm add ai @ai-sdk/gateway zod

The @ai-sdk/gateway package exposes a single provider interface that proxies to any model Vercel AI Gateway supports. You do not install @ai-sdk/openai, @ai-sdk/anthropic, or any other provider-specific package — that is the whole point.

Step 2: Configure the Gateway

Inside your Vercel project, open the AI tab and click Enable AI Gateway. Vercel automatically provisions a Gateway endpoint scoped to your project and exposes the API key as the environment variable AI_GATEWAY_API_KEY.

If you are developing locally, link the project and pull env vars:

pnpm dlx vercel link
pnpm dlx vercel env pull .env.local

Your .env.local should now contain:

AI_GATEWAY_API_KEY=vck_xxxxxxxxxxxxxxxxxxxx

That single key replaces every provider key you would normally need.

Step 3: Create the Gateway Client

Create lib/ai.ts to centralize Gateway access:

import { createGateway } from "@ai-sdk/gateway";
 
export const gateway = createGateway({
  apiKey: process.env.AI_GATEWAY_API_KEY!,
});
 
export const MODELS = {
  fast: "google/gemini-2.5-flash",
  smart: "anthropic/claude-sonnet-4.6",
  cheap: "openai/gpt-4o-mini",
  reasoning: "openai/o3-mini",
  openWeights: "mistral/mistral-large-latest",
} as const;
 
export type ModelKey = keyof typeof MODELS;

Each value in MODELS is a fully-qualified model identifier the Gateway recognizes. Swapping providers is now a one-line change anywhere in the codebase.

Step 4: Build a Basic Chat Route

Create app/api/chat/route.ts:

import { streamText, convertToCoreMessages } from "ai";
import { gateway, MODELS, type ModelKey } from "@/lib/ai";
 
export const runtime = "edge";
export const maxDuration = 30;
 
export async function POST(req: Request) {
  const { messages, model = "smart" } = await req.json() as {
    messages: { role: string; content: string }[];
    model?: ModelKey;
  };
 
  const result = await streamText({
    model: gateway(MODELS[model]),
    messages: convertToCoreMessages(messages),
    system: "You are a concise, helpful assistant. Keep answers under 200 words unless asked for detail.",
  });
 
  return result.toDataStreamResponse();
}

Notice there is zero provider-specific code. The Gateway resolves anthropic/claude-sonnet-4.6 to the right API behind the scenes.

Step 5: Build the Chat UI

Create app/page.tsx with a minimal chat interface:

"use client";
 
import { useChat } from "ai/react";
import { useState } from "react";
 
const MODEL_OPTIONS = [
  { value: "fast", label: "Gemini 2.5 Flash (fast, cheap)" },
  { value: "smart", label: "Claude Sonnet 4.6 (best quality)" },
  { value: "cheap", label: "GPT-4o Mini (balanced)" },
  { value: "reasoning", label: "OpenAI o3-mini (reasoning)" },
  { value: "openWeights", label: "Mistral Large (open weights)" },
];
 
export default function ChatPage() {
  const [model, setModel] = useState("smart");
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: "/api/chat",
    body: { model },
  });
 
  return (
    <main className="mx-auto max-w-2xl p-6">
      <h1 className="mb-4 text-2xl font-bold">AI Gateway Chat</h1>
 
      <select
        value={model}
        onChange={(e) => setModel(e.target.value)}
        className="mb-4 w-full rounded border p-2"
      >
        {MODEL_OPTIONS.map((opt) => (
          <option key={opt.value} value={opt.value}>{opt.label}</option>
        ))}
      </select>
 
      <div className="mb-4 space-y-3">
        {messages.map((m) => (
          <div key={m.id} className="rounded border p-3">
            <div className="text-xs text-gray-500">{m.role}</div>
            <div className="whitespace-pre-wrap">{m.content}</div>
          </div>
        ))}
      </div>
 
      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="flex-1 rounded border p-2"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading || !input}
          className="rounded bg-black px-4 py-2 text-white disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </main>
  );
}

Run pnpm dev and you should be able to chat with five different models from the same UI, all routed through the Gateway.

Step 6: Add Automatic Failover

Real production traffic should never see provider errors. Configure a fallback chain so that if Claude is rate-limited, the request is retried against GPT-4o, and so on.

Update app/api/chat/route.ts:

import { streamText, convertToCoreMessages } from "ai";
import { gateway, MODELS, type ModelKey } from "@/lib/ai";
 
export const runtime = "edge";
 
const FALLBACK_CHAIN: ModelKey[] = ["smart", "cheap", "fast"];
 
async function streamWithFailover(
  messages: any[],
  preferred: ModelKey,
) {
  const chain = [preferred, ...FALLBACK_CHAIN.filter((m) => m !== preferred)];
 
  for (const key of chain) {
    try {
      return await streamText({
        model: gateway(MODELS[key]),
        messages,
        system: "You are a concise, helpful assistant.",
        experimental_telemetry: { isEnabled: true, functionId: `chat-${key}` },
      });
    } catch (err) {
      console.warn(`[gateway] ${key} failed, trying next provider`, err);
      if (key === chain[chain.length - 1]) throw err;
    }
  }
 
  throw new Error("All providers failed");
}
 
export async function POST(req: Request) {
  const { messages, model = "smart" } = await req.json();
  const result = await streamWithFailover(
    convertToCoreMessages(messages),
    model as ModelKey,
  );
  return result.toDataStreamResponse();
}

The Gateway also supports server-side failover policies you can configure in the dashboard, which is preferable for production because the routing decision happens before tokens are billed.

Step 7: Enable Response Caching

Identical prompts arriving within a short window should not cost you tokens twice. Vercel AI Gateway has built-in semantic caching — enable it by adding cache headers:

const result = await streamText({
  model: gateway(MODELS[model], {
    headers: {
      "x-gateway-cache-ttl": "3600",
      "x-gateway-cache-mode": "semantic",
    },
  }),
  messages,
});

Three modes are available:

exact — only matches byte-identical prompts
semantic — matches prompts that are similar in meaning (uses embeddings)
disabled — bypass cache entirely

For an FAQ-style assistant, semantic caching can cut token spend by 40 to 70 percent because users tend to ask the same question many different ways.

Step 8: Track Usage Per User

For multi-tenant apps, you want token usage attributed to individual users. Pass a metadata header:

const result = await streamText({
  model: gateway(MODELS[model], {
    headers: {
      "x-gateway-user-id": userId,
      "x-gateway-tags": "tier:pro,feature:chat",
    },
  }),
  messages,
});

The Gateway dashboard then breaks down spend by user-id and any tags you supply. You can also export this data via the analytics API to bill customers based on actual model usage.

Step 9: Add a Live Model Comparison Panel

A killer feature of multi-provider routing is letting users see the same prompt answered by different models side by side. Create app/compare/page.tsx:

"use client";
 
import { useState } from "react";
 
const COMPARE_MODELS = ["smart", "fast", "reasoning"] as const;
 
export default function ComparePage() {
  const [prompt, setPrompt] = useState("");
  const [results, setResults] = useState<Record<string, string>>({});
  const [loading, setLoading] = useState(false);
 
  async function runComparison() {
    setLoading(true);
    setResults({});
 
    await Promise.all(
      COMPARE_MODELS.map(async (model) => {
        const res = await fetch("/api/chat", {
          method: "POST",
          body: JSON.stringify({
            messages: [{ role: "user", content: prompt }],
            model,
          }),
        });
        const text = await res.text();
        setResults((prev) => ({ ...prev, [model]: text }));
      }),
    );
 
    setLoading(false);
  }
 
  return (
    <main className="mx-auto max-w-5xl p-6">
      <h1 className="mb-4 text-2xl font-bold">Side-by-Side Model Comparison</h1>
 
      <textarea
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        placeholder="Enter a prompt to compare across models..."
        className="mb-4 h-32 w-full rounded border p-3"
      />
 
      <button
        onClick={runComparison}
        disabled={loading || !prompt}
        className="mb-6 rounded bg-black px-6 py-2 text-white disabled:opacity-50"
      >
        {loading ? "Running..." : "Compare Models"}
      </button>
 
      <div className="grid gap-4 md:grid-cols-3">
        {COMPARE_MODELS.map((model) => (
          <div key={model} className="rounded border p-4">
            <h3 className="mb-2 font-semibold">{model}</h3>
            <div className="whitespace-pre-wrap text-sm">
              {results[model] ?? (loading ? "Loading..." : "")}
            </div>
          </div>
        ))}
      </div>
    </main>
  );
}

This pattern is invaluable for prompt engineering — you can instantly see how Claude handles a nuanced request versus how Gemini Flash or o3-mini handle it.

Step 10: Configure Spend Limits and Alerts

Open the AI Gateway dashboard in Vercel and set:

Monthly budget — Gateway stops routing requests once you hit the cap
Per-model rate limits — protect against runaway loops or abuse
Alerts — email or Slack when you cross 50, 75, or 90 percent of budget

These are critical for any public-facing AI feature. Without them, a single malicious user with a script can drain thousands of dollars in tokens overnight.

Testing Your Implementation

Verify each piece works:

Basic routing — chat with each model and confirm you get a response
Failover — temporarily set a wrong API key for one provider in the dashboard, send a request, and confirm the next provider in the chain serves it
Caching — send the same prompt twice and check the Gateway dashboard to see one billed call and one cache hit
User tracking — fire requests with different x-gateway-user-id headers and confirm the per-user breakdown appears
Comparison — submit a prompt on the compare page and verify all three models respond

Troubleshooting

Authentication errors Confirm AI_GATEWAY_API_KEY is set in .env.local and that the project is linked to the Vercel project where Gateway is enabled.

Model not found Check the exact model identifier in the Gateway dashboard. Provider prefixes (anthropic/, openai/, google/) are case-sensitive.

Streaming hangs in production Make sure your route exports runtime = "edge" or sets maxDuration to at least 30 seconds. Default serverless timeouts can cut off long generations.

Cache never hits Semantic cache needs a few requests to warm up the embedding index. Run the same prompt three or four times before judging hit rate.

Next Steps

Wire the Gateway into your existing Vercel AI SDK projects — it is a drop-in replacement for direct provider imports
Read the Vercel AI Gateway docs for advanced features like custom routing rules
Combine with Mem0 for stateful, multi-provider assistants
Add Langfuse observability on top of the Gateway for prompt-level tracing
Pair with Arcjet rate limiting to protect your Gateway endpoint from abuse

Conclusion

Vercel AI Gateway turns multi-provider AI from a maintenance burden into a one-line config change. You get reliability, observability, cost control, and the freedom to swap models without rewriting code. For any Next.js app that takes AI seriously in 2026, it is the default starting point — there is almost no reason to call provider SDKs directly anymore.

The pattern shown here scales from a side project to a multi-million-request production system. Start with the basic routing setup, add failover and caching as traffic grows, and use per-user tagging to keep your finance team happy.