Structured Output from LLMs: JSON Mode, Function Calling & More

Structured Output from LLMs with JSON Mode and Function Calling

LLMs generate text. Your application needs data structures. The gap between these two realities is where production bugs live — malformed JSON, missing fields, wrong types, and inconsistent formatting that breaks your UI at 2 AM.

In 2026, every major AI provider offers native structured output capabilities that guarantee schema compliance at the token level. If you are still parsing LLM responses with regex or hoping your prompt engineering holds up, this guide will show you the right way.

The Three Levels of Output Control

Not all approaches to getting structured data from LLMs are equal. Understanding the tradeoffs helps you pick the right tool for your use case.

Level 1: Prompt Engineering

The simplest approach — you ask the model to return JSON in your prompt:

Return a JSON object with fields: name (string), age (number), skills (array of strings).

This works 80–95% of the time, but offers no type guarantees and fails silently. The model might return valid JSON with an unexpected field name, or wrap the JSON in markdown code fences. Fine for prototypes, dangerous for production.

Level 2: Function Calling (Tool Use)

Function calling lets you define typed schemas that the model uses to structure its output. The model "calls" a function you define, returning parameters that match your schema:

const tools = [{
  type: "function",
  function: {
    name: "extract_contact",
    description: "Extract contact information from text",
    parameters: {
      type: "object",
      properties: {
        name: { type: "string" },
        email: { type: "string", format: "email" },
        company: { type: "string" }
      },
      required: ["name", "email"]
    }
  }
}];

Reliability jumps to 95–99%. The schema acts as a strong hint, but the model can still occasionally produce values that do not match your expected format within a field.

Level 3: Native Structured Output

The gold standard. Constrained decoding uses a finite state machine to mask invalid tokens at generation time. Rather than trusting post-hoc validation, the model can only predict schema-valid tokens at each step.

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
 
const ContactSchema = z.object({
  name: z.string(),
  email: z.string().email(),
  company: z.string().optional(),
  role: z.enum(["developer", "designer", "manager", "other"]),
});
 
const client = new OpenAI();
 
const response = await client.beta.chat.completions.parse({
  model: "gpt-4o",
  messages: [
    { role: "user", content: "Extract contact: John Smith, john@acme.co, CTO at Acme Inc" }
  ],
  response_format: zodResponseFormat(ContactSchema, "contact"),
});
 
const contact = response.choices[0].message.parsed;
// TypeScript knows: contact.name is string, contact.role is "developer" | "designer" | ...

Schema compliance is 100%. This is where production applications should be in 2026.

Provider Comparison: Who Supports What

Each major provider handles structured output differently:

OpenAI offers the most mature native support. The response_format parameter with json_schema type enforces strict schema compliance. Their Python and TypeScript SDKs integrate directly with Pydantic and Zod.

Anthropic (Claude) achieves structured output through tool use. You define a tool with an input_schema, and Claude returns structured parameters. The TypeScript SDK provides a zodTool helper for seamless Zod integration:

import Anthropic from "@anthropic-ai/sdk";
import { zodTool } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";
 
const client = new Anthropic();
 
const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  currency: z.enum(["USD", "EUR", "TND"]),
  inStock: z.boolean(),
});
 
const response = await client.messages.create({
  model: "claude-sonnet-4-6-20260320",
  max_tokens: 1024,
  tools: [
    zodTool({
      name: "extract_product",
      description: "Extract product details from description",
      schema: ProductSchema,
    }),
  ],
  tool_choice: { type: "tool", name: "extract_product" },
  messages: [
    { role: "user", content: "The new headphones cost 89.99 EUR and are currently available" }
  ],
});

Google Gemini supports native structured output through response_schema in the API configuration. It accepts Pydantic models directly in Python and JSON Schema in REST calls.

The Validation Sandwich Pattern

Even with native structured output guaranteeing valid JSON and correct types, you still need business logic validation. The "validation sandwich" pattern layers three checks:

import { z } from "zod";
 
// Layer 1: Schema defines structure
const OrderSchema = z.object({
  productId: z.string().uuid(),
  quantity: z.number().int().positive(),
  unitPrice: z.number().positive(),
  total: z.number().positive(),
});
 
// Layer 2: Business logic refinement
const ValidatedOrder = OrderSchema.refine(
  (order) => Math.abs(order.total - order.quantity * order.unitPrice) < 0.01,
  { message: "Total must equal quantity × unitPrice" }
);
 
// Layer 3: Application-level check
async function processOrder(raw: unknown) {
  const order = ValidatedOrder.parse(raw);
  const product = await db.products.find(order.productId);
  if (!product) throw new Error("Product not found");
  if (order.quantity > product.stockCount) throw new Error("Insufficient stock");
  return order;
}

The LLM guarantees the shape. Zod validates the constraints. Your application logic verifies against real-world state. Trust no single layer alone.

When to Use What

Choosing between structured output, function calling, and JSON mode depends on your use case:

Use native structured output when you need to extract data from text, generate structured responses for your UI, or build data pipelines. The model returns exactly the shape you defined — no parsing, no surprises.

Use function calling when your model needs to interact with external systems, choose between multiple tools, or operate within an agentic loop. Function calling lets the model decide which action to take and provides structured parameters for that action.

Avoid JSON mode alone. It guarantees valid JSON syntax but not schema adherence. You can get back any valid JSON object, which means field names, types, and structure are all unpredictable.

Schema Design Best Practices

How you design your schemas directly impacts output quality:

Keep schemas focused. Include only fields you actually use. Larger schemas increase token consumption and can confuse the model. If you need different structures for different contexts, define separate schemas.

Use descriptive field names and descriptions. The model reads property names and descriptions to understand what content belongs in each field. customerFullName extracts better than name. Adding a description like "The customer's full legal name as it appears on their ID" improves accuracy further.

Prefer enums over open strings. When a field has a known set of values, use an enum. This constrains the output space and eliminates typos or variations:

// Weak: model might return "High", "HIGH", "high priority", etc.
priority: z.string()
 
// Strong: guaranteed to be one of these values
priority: z.enum(["low", "medium", "high", "critical"])

Break complex extractions into smaller calls. A single call with 20 fields is less reliable and more expensive than two calls with 10 fields each. Parallel execution makes this practically free in terms of latency.

Real-World Example: Invoice Data Extraction

Here is a complete example that extracts invoice data from unstructured text — a common need for e-invoicing systems and business automation:

import Anthropic from "@anthropic-ai/sdk";
import { zodTool } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";
 
const InvoiceSchema = z.object({
  invoiceNumber: z.string().describe("Invoice or receipt number"),
  vendor: z.string().describe("Company or person who issued the invoice"),
  date: z.string().describe("Invoice date in YYYY-MM-DD format"),
  lineItems: z.array(z.object({
    description: z.string(),
    quantity: z.number().positive(),
    unitPrice: z.number().positive(),
  })),
  subtotal: z.number(),
  taxRate: z.number().describe("Tax rate as a percentage, e.g. 19 for 19%"),
  taxAmount: z.number(),
  total: z.number(),
  currency: z.enum(["USD", "EUR", "TND", "SAR", "AED"]),
});
 
const client = new Anthropic();
 
async function extractInvoice(text: string) {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6-20260320",
    max_tokens: 2048,
    tools: [
      zodTool({
        name: "extract_invoice",
        description: "Extract structured invoice data from text or OCR output",
        schema: InvoiceSchema,
      }),
    ],
    tool_choice: { type: "tool", name: "extract_invoice" },
    messages: [{ role: "user", content: text }],
  });
 
  const toolBlock = response.content.find((b) => b.type === "tool_use");
  return InvoiceSchema.parse(toolBlock?.input);
}

This pattern powers document processing pipelines, accounting integrations, and compliance workflows where data must be extracted reliably from unstructured sources.

Cost Considerations

Structured output is not free. Complex schemas add tokens to every request — both in the system prompt (where the schema is injected) and in the response. Keep these tips in mind:

Minimize optional fields. Every optional field the model considers costs tokens, even when it outputs nothing.
Cache your schemas. If using OpenAI or Anthropic with prompt caching enabled, schema definitions in the system prompt are cached across requests, reducing costs by up to 90% for repeated calls.
Use the cheapest model that works. For simple extraction tasks, smaller models like Claude Haiku or GPT-4o-mini handle structured output just as reliably as frontier models — at a fraction of the cost. Check our guide on AI API cost optimization for more strategies.

What Comes Next

The structured output ecosystem is evolving fast. Streaming structured output — where partial JSON objects are delivered as they are generated — is now supported by OpenAI and is coming to other providers. This enables real-time UIs that progressively render extracted data.

Multi-modal structured output, where models extract structured data directly from images, PDFs, and audio, is also maturing. Combined with AI-powered business automation, these capabilities are making it possible to build end-to-end document processing pipelines that require zero human intervention.

The bottom line: if you are building AI-powered features in 2026, structured output is not optional. It is the foundation that makes everything else — agents, automation, data pipelines — actually work in production.