If you have ever shipped an LLM feature in Python, you know the failure mode: the model returns a JSON-shaped string, you parse it, something is missing, and you write yet another defensive try/except that papers over the issue. Pydantic AI flips that around. It treats your Pydantic schemas as the contract the model has to honor and gives you a FastAPI-style, dependency-injected agent runtime on top — with streaming, tools, retries, and observability already wired in.

In this tutorial you will install Pydantic AI from scratch, build a customer-support agent with structured outputs and database tools, swap models between OpenAI and Anthropic, stream tokens to a FastAPI endpoint, and instrument the whole thing with Logfire so you can see exactly what your agent did and why.

Prerequisites

Before starting, make sure you have:

Python 3.10 or newer installed
An API key for at least one provider (OpenAI, Anthropic, Google, Mistral, or a local Ollama instance)
Basic familiarity with Pydantic models and async/await
A terminal and a code editor (VS Code with the Python extension recommended)
Optional: a Logfire account for observability (free tier is plenty)

What You Will Build

By the end of this tutorial, you will have:

A Pydantic AI project with environment-isolated dependencies
A typed support agent that returns Pydantic-validated structured responses
Tools backed by a SQLite database for ticket lookups and order status
A streaming chat endpoint exposed through FastAPI
Provider-agnostic configuration that swaps OpenAI, Anthropic, and Gemini at runtime
End-to-end observability with Logfire, including token usage and tool spans
A Pytest harness that runs the agent against a fake LLM for fast deterministic tests

Step 1: Install Pydantic AI

Pydantic AI ships as a single package with optional extras for each provider. Create an isolated environment first — uv is the fastest option in 2026, but venv works equally well.

mkdir support-agent && cd support-agent
uv venv
source .venv/bin/activate
uv pip install "pydantic-ai[openai,anthropic,logfire]" fastapi uvicorn aiosqlite

If you prefer plain pip:

python -m venv .venv
source .venv/bin/activate
pip install "pydantic-ai[openai,anthropic,logfire]" fastapi uvicorn aiosqlite

Verify the install:

python -c "import pydantic_ai; print(pydantic_ai.__version__)"

You should see a version starting with 0.x. Pydantic AI is still pre-1.0 at the time of writing, but the public API has been stable for several releases and the team has committed to semver going forward.

Step 2: Configure Your Providers

Export API keys for the providers you want to use. Pydantic AI reads them at runtime via the underlying SDKs, so never commit them.

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="..."

For real projects, put these in a .env file and load them with python-dotenv or direnv. Add .env to .gitignore immediately.

Step 3: Your First Typed Agent

Create agent.py with a minimal agent that returns a structured SupportResponse:

# agent.py
from pydantic import BaseModel, Field
from pydantic_ai import Agent
 
 
class SupportResponse(BaseModel):
    """Structured response from the support agent."""
 
    answer: str = Field(description="Plain-language answer for the user")
    needs_human: bool = Field(
        description="True when the issue cannot be resolved by the bot"
    )
    confidence: float = Field(ge=0, le=1, description="Self-rated confidence")
 
 
support_agent = Agent(
    "openai:gpt-4o-mini",
    output_type=SupportResponse,
    system_prompt=(
        "You are a calm, concise customer-support agent for an e-commerce store. "
        "Always return a SupportResponse. Set needs_human=true if the user is angry, "
        "asks for a refund over 200 USD, or mentions legal action."
    ),
)
 
 
if __name__ == "__main__":
    result = support_agent.run_sync(
        "My order #4421 has been stuck on 'shipped' for 14 days. What now?"
    )
    print(result.output)
    print("Tokens used:", result.usage())

Run it:

python agent.py

Pydantic AI sends your system prompt and the user message to GPT-4o-mini, asks it to return JSON matching SupportResponse, validates the response with Pydantic, and gives you a fully typed object. If the model returns invalid JSON or a missing field, the agent automatically retries with a corrective message — no manual parsing required.

Step 4: Add Dependency Injection

Real agents need access to databases, HTTP clients, and configuration. Pydantic AI uses a FastAPI-style dependency system: you declare a deps_type and access it from inside tools and prompts via the RunContext.

Create deps.py:

# deps.py
from dataclasses import dataclass
 
import aiosqlite
 
 
@dataclass
class SupportDeps:
    """Resources the agent needs at runtime."""
 
    db: aiosqlite.Connection
    customer_id: int

Update agent.py to use it:

from pydantic_ai import Agent, RunContext
 
from deps import SupportDeps
 
support_agent = Agent(
    "openai:gpt-4o-mini",
    deps_type=SupportDeps,
    output_type=SupportResponse,
    system_prompt=(
        "You are a customer-support agent. Use the provided tools to look up orders "
        "before answering. Never invent order details."
    ),
)
 
 
@support_agent.system_prompt
async def add_customer_context(ctx: RunContext[SupportDeps]) -> str:
    """Inject customer-specific context into every system prompt."""
    async with ctx.deps.db.execute(
        "SELECT name, tier FROM customers WHERE id = ?", (ctx.deps.customer_id,)
    ) as cursor:
        row = await cursor.fetchone()
    if not row:
        return "Customer record not found."
    name, tier = row
    return f"You are speaking with {name} (tier: {tier}). Address them by first name."

Notice how the system prompt is a plain async function with full access to deps. You get the same ergonomics you would get from a FastAPI dependency, only the consumer is the LLM.

Step 5: Define Tools the Model Can Call

Tools are async functions decorated with @support_agent.tool. Pydantic AI inspects their type hints and generates JSON schemas the model can call. The return value is sent back into the conversation as a tool message.

from datetime import datetime
from typing import Literal
 
 
@support_agent.tool
async def get_order_status(
    ctx: RunContext[SupportDeps],
    order_id: int,
) -> dict:
    """Look up the latest status for a customer order.
 
    Args:
        order_id: The numeric order identifier shown on receipts.
    """
    async with ctx.deps.db.execute(
        "SELECT status, last_update FROM orders WHERE id = ? AND customer_id = ?",
        (order_id, ctx.deps.customer_id),
    ) as cursor:
        row = await cursor.fetchone()
    if not row:
        return {"error": "order_not_found", "order_id": order_id}
    status, last_update = row
    return {
        "order_id": order_id,
        "status": status,
        "last_update": last_update,
        "stale": (datetime.utcnow().isoformat() > last_update),
    }
 
 
@support_agent.tool
async def issue_refund(
    ctx: RunContext[SupportDeps],
    order_id: int,
    amount_cents: int,
    reason: Literal["damaged", "late", "wrong_item", "other"],
) -> dict:
    """Issue a refund for a specific order. Amounts over 20000 cents require human approval."""
    if amount_cents > 20_000:
        return {"approved": False, "reason": "amount_exceeds_bot_limit"}
    await ctx.deps.db.execute(
        "INSERT INTO refunds (order_id, amount_cents, reason) VALUES (?, ?, ?)",
        (order_id, amount_cents, reason),
    )
    await ctx.deps.db.commit()
    return {"approved": True, "order_id": order_id, "amount_cents": amount_cents}

Two things to notice:

Type hints become the schema. The Literal for reason becomes an enum the model must pick from. Pydantic validates each tool call before your function runs.
Tools can guard themselves. The refund tool refuses anything over 200 USD and lets the agent escalate. You do not have to teach the model the limit in prose — the tool enforces it.

Step 6: Run the Agent Against a Real Database

Wire it together in main.py:

# main.py
import asyncio
 
import aiosqlite
 
from agent import support_agent
from deps import SupportDeps
 
 
async def setup_db() -> aiosqlite.Connection:
    db = await aiosqlite.connect(":memory:")
    await db.executescript(
        """
        CREATE TABLE customers (id INTEGER PRIMARY KEY, name TEXT, tier TEXT);
        CREATE TABLE orders (
          id INTEGER PRIMARY KEY,
          customer_id INTEGER,
          status TEXT,
          last_update TEXT
        );
        CREATE TABLE refunds (
          id INTEGER PRIMARY KEY AUTOINCREMENT,
          order_id INTEGER,
          amount_cents INTEGER,
          reason TEXT
        );
        INSERT INTO customers VALUES (1, 'Aya', 'gold');
        INSERT INTO orders VALUES
          (4421, 1, 'shipped', '2026-04-13T08:00:00'),
          (4422, 1, 'delivered', '2026-04-22T14:30:00');
        """
    )
    await db.commit()
    return db
 
 
async def main() -> None:
    db = await setup_db()
    deps = SupportDeps(db=db, customer_id=1)
    result = await support_agent.run(
        "Order #4421 still says shipped after two weeks. Can you refund 35 USD as a goodwill credit?",
        deps=deps,
    )
    print("Answer:", result.output.answer)
    print("Needs human:", result.output.needs_human)
    print("Confidence:", result.output.confidence)
    print("Tool calls:", [m.kind for m in result.all_messages() if m.kind == "tool-call"])
 
 
if __name__ == "__main__":
    asyncio.run(main())

Run:

python main.py

The agent will call get_order_status, see the order is stale, call issue_refund for 3500 cents, and return a SupportResponse you can pass straight to your UI layer.

Step 7: Stream the Agent Through FastAPI

Pydantic AI exposes agent.run_stream which yields incremental output. Pair it with FastAPI's StreamingResponse for a chat endpoint that streams tokens to the browser.

# api.py
from contextlib import asynccontextmanager
 
import aiosqlite
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
 
from agent import support_agent
from deps import SupportDeps
 
 
class ChatBody(BaseModel):
    customer_id: int
    message: str
 
 
db_holder: dict = {}
 
 
@asynccontextmanager
async def lifespan(app: FastAPI):
    db_holder["db"] = await aiosqlite.connect("support.db")
    yield
    await db_holder["db"].close()
 
 
app = FastAPI(lifespan=lifespan)
 
 
@app.post("/chat")
async def chat(body: ChatBody) -> StreamingResponse:
    deps = SupportDeps(db=db_holder["db"], customer_id=body.customer_id)
 
    async def token_stream():
        async with support_agent.run_stream(body.message, deps=deps) as result:
            async for chunk in result.stream_text(delta=True):
                yield chunk
 
    return StreamingResponse(token_stream(), media_type="text/plain")

Start the server:

uvicorn api:app --reload

And from another terminal:

curl -N -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"customer_id": 1, "message": "Where is order 4422?"}'

You will see tokens stream in real time. Because output_type is set, the final aggregated value is still a validated SupportResponse — you get streaming UX without giving up structure.

Step 8: Swap Providers Without Rewriting Code

The first argument to Agent is a model identifier. Change a single string and the agent runs against a different provider. For environments where the choice is dynamic, use pydantic_ai.models directly:

import os
 
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.models.openai import OpenAIModel
 
 
def pick_model():
    name = os.getenv("AGENT_MODEL", "openai:gpt-4o-mini")
    if name.startswith("openai:"):
        return OpenAIModel(name.split(":", 1)[1])
    if name.startswith("anthropic:"):
        return AnthropicModel(name.split(":", 1)[1])
    raise ValueError(f"Unknown model: {name}")
 
 
support_agent = Agent(
    pick_model(),
    deps_type=SupportDeps,
    output_type=SupportResponse,
    system_prompt="...",
)

Now AGENT_MODEL=anthropic:claude-sonnet-4-6 python main.py runs the same agent on Claude Sonnet 4.6 with no other code changes. Tools, structured outputs, and dependency injection all work identically because Pydantic AI normalizes the provider differences for you.

Step 9: Add Observability with Logfire

Pydantic AI is built by the Pydantic team, so it integrates natively with Logfire — their OpenTelemetry-based observability product. Every model call, tool invocation, retry, and validation error becomes a span you can search.

# observability.py
import logfire
 
logfire.configure(token="your-write-token", service_name="support-agent")
logfire.instrument_pydantic_ai()

Import this module once at the top of api.py. Restart the server and send a few chat requests. In the Logfire UI you will see:

A root span per agent run
Child spans for each model call, with prompt, response, and token counts
Tool spans showing arguments, return values, and duration
Validation spans when Pydantic AI retries on bad output

For self-hosted observability, swap the call for logfire.configure(send_to_logfire=False) and point standard OTLP at your own collector. The instrumentation is the same.

Step 10: Test the Agent Without Burning Tokens

The pydantic_ai.models.test.TestModel lets you run end-to-end agent tests with zero network calls. It returns a deterministic structured response that matches your output_type, and you can assert on the tool calls the agent made.

# test_agent.py
import pytest
from pydantic_ai.models.test import TestModel
 
from agent import support_agent
from deps import SupportDeps
 
 
@pytest.mark.asyncio
async def test_refund_flow(tmp_db):
    deps = SupportDeps(db=tmp_db, customer_id=1)
    with support_agent.override(model=TestModel()):
        result = await support_agent.run(
            "Refund 50 USD for order 4421",
            deps=deps,
        )
    tool_calls = [m for m in result.all_messages() if m.kind == "tool-call"]
    assert any(t.tool_name == "issue_refund" for t in tool_calls)
    assert isinstance(result.output.answer, str)
    assert 0 <= result.output.confidence <= 1

Add pytest and pytest-asyncio to your dev dependencies and run:

pytest -v

The whole suite finishes in milliseconds because no real LLM is involved. Use TestModel for unit tests, then layer in a small set of integration tests that hit a real provider on every release candidate.

Testing Your Implementation

Walk through the full happy path one more time:

python main.py returns a SupportResponse with needs_human=False and a refund recorded in SQLite
curl against /chat streams tokens and ends with structured output
AGENT_MODEL=anthropic:claude-sonnet-4-6 python main.py produces the same shape on Claude
Logfire shows a span tree with tool calls and token usage
pytest passes in under a second using TestModel

If any of these fail, the most common culprits are missing API keys, an outdated provider extra (run uv pip install --upgrade "pydantic-ai[openai,anthropic]"), or a tool function that does not declare types Pydantic can introspect.

Troubleshooting

The model keeps returning prose instead of structured output. Make sure output_type is set on the Agent and that you are not also asking for free-form text in the system prompt. Pydantic AI uses tool calling under the hood; some older models need to be pinned to a function-calling capable variant.

Validation errors loop forever. Pydantic AI retries up to retries=1 by default. Bump it with Agent(..., retries=3) for flaky models, but if a field is impossible to satisfy, you will burn tokens. Read the validation error carefully — it usually points at a Field constraint that is too strict.

Tools are never called. Check that you decorated them with @agent.tool (not @agent.tool_plain unless you want to skip RunContext) and that their docstrings describe when to call them. Models rely heavily on tool descriptions to decide.

Streaming endpoint returns the whole message at once. That is FastAPI buffering. Make sure you are returning a StreamingResponse and not awaiting the generator before yielding.

Next Steps

Combine Pydantic AI with our FastAPI Docker production guide to ship the agent behind a reverse proxy
Pair it with Postgres full-text search on the tool side for richer retrieval
Compare the developer experience to the Vercel AI SDK agent pattern on the TypeScript side
Read the official Pydantic AI docs for advanced patterns: graph workflows, multi-agent handoff, and structured streaming with delta validation

Conclusion

Pydantic AI takes the parts of Python web development that already work — typed schemas, dependency injection, and async-first APIs — and applies them to LLM agents. You stop thinking about JSON parsing and prompt-shaped strings and start thinking about contracts: what does my agent return, what tools can it call, and what does it need to do its job. The result is agent code that looks like the rest of your Python codebase, tests like the rest of your Python codebase, and ships with the same confidence as the rest of your Python codebase.

Build the support bot above, instrument it with Logfire, and the next time someone asks how you handle structured LLM output you can point at a passing test suite instead of a hopeful regex.

Pydantic AI Tutorial 2026: Build Type-Safe LLM Agents in Python