Pydantic AI Tutorial 2026: Build Type-Safe LLM Agents in Python

If you have ever shipped an LLM feature in Python, you know the failure mode: the model returns a JSON-shaped string, you parse it, something is missing, and you write yet another defensive try/except that papers over the issue. Pydantic AI flips that around. It treats your Pydantic schemas as the contract the model has to honor and gives you a FastAPI-style, dependency-injected agent runtime on top — with streaming, tools, retries, and observability already wired in.
In this tutorial you will install Pydantic AI from scratch, build a customer-support agent with structured outputs and database tools, swap models between OpenAI and Anthropic, stream tokens to a FastAPI endpoint, and instrument the whole thing with Logfire so you can see exactly what your agent did and why.
Prerequisites
Before starting, make sure you have:
- Python 3.10 or newer installed
- An API key for at least one provider (OpenAI, Anthropic, Google, Mistral, or a local Ollama instance)
- Basic familiarity with Pydantic models and
async/await - A terminal and a code editor (VS Code with the Python extension recommended)
- Optional: a Logfire account for observability (free tier is plenty)
What You Will Build
By the end of this tutorial, you will have:
- A Pydantic AI project with environment-isolated dependencies
- A typed support agent that returns Pydantic-validated structured responses
- Tools backed by a SQLite database for ticket lookups and order status
- A streaming chat endpoint exposed through FastAPI
- Provider-agnostic configuration that swaps OpenAI, Anthropic, and Gemini at runtime
- End-to-end observability with Logfire, including token usage and tool spans
- A Pytest harness that runs the agent against a fake LLM for fast deterministic tests
Step 1: Install Pydantic AI
Pydantic AI ships as a single package with optional extras for each provider. Create an isolated environment first — uv is the fastest option in 2026, but venv works equally well.
mkdir support-agent && cd support-agent
uv venv
source .venv/bin/activate
uv pip install "pydantic-ai[openai,anthropic,logfire]" fastapi uvicorn aiosqliteIf you prefer plain pip:
python -m venv .venv
source .venv/bin/activate
pip install "pydantic-ai[openai,anthropic,logfire]" fastapi uvicorn aiosqliteVerify the install:
python -c "import pydantic_ai; print(pydantic_ai.__version__)"You should see a version starting with 0.x. Pydantic AI is still pre-1.0 at the time of writing, but the public API has been stable for several releases and the team has committed to semver going forward.
Step 2: Configure Your Providers
Export API keys for the providers you want to use. Pydantic AI reads them at runtime via the underlying SDKs, so never commit them.
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="..."For real projects, put these in a .env file and load them with python-dotenv or direnv. Add .env to .gitignore immediately.
Step 3: Your First Typed Agent
Create agent.py with a minimal agent that returns a structured SupportResponse:
# agent.py
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class SupportResponse(BaseModel):
"""Structured response from the support agent."""
answer: str = Field(description="Plain-language answer for the user")
needs_human: bool = Field(
description="True when the issue cannot be resolved by the bot"
)
confidence: float = Field(ge=0, le=1, description="Self-rated confidence")
support_agent = Agent(
"openai:gpt-4o-mini",
output_type=SupportResponse,
system_prompt=(
"You are a calm, concise customer-support agent for an e-commerce store. "
"Always return a SupportResponse. Set needs_human=true if the user is angry, "
"asks for a refund over 200 USD, or mentions legal action."
),
)
if __name__ == "__main__":
result = support_agent.run_sync(
"My order #4421 has been stuck on 'shipped' for 14 days. What now?"
)
print(result.output)
print("Tokens used:", result.usage())Run it:
python agent.pyPydantic AI sends your system prompt and the user message to GPT-4o-mini, asks it to return JSON matching SupportResponse, validates the response with Pydantic, and gives you a fully typed object. If the model returns invalid JSON or a missing field, the agent automatically retries with a corrective message — no manual parsing required.
Step 4: Add Dependency Injection
Real agents need access to databases, HTTP clients, and configuration. Pydantic AI uses a FastAPI-style dependency system: you declare a deps_type and access it from inside tools and prompts via the RunContext.
Create deps.py:
# deps.py
from dataclasses import dataclass
import aiosqlite
@dataclass
class SupportDeps:
"""Resources the agent needs at runtime."""
db: aiosqlite.Connection
customer_id: intUpdate agent.py to use it:
from pydantic_ai import Agent, RunContext
from deps import SupportDeps
support_agent = Agent(
"openai:gpt-4o-mini",
deps_type=SupportDeps,
output_type=SupportResponse,
system_prompt=(
"You are a customer-support agent. Use the provided tools to look up orders "
"before answering. Never invent order details."
),
)
@support_agent.system_prompt
async def add_customer_context(ctx: RunContext[SupportDeps]) -> str:
"""Inject customer-specific context into every system prompt."""
async with ctx.deps.db.execute(
"SELECT name, tier FROM customers WHERE id = ?", (ctx.deps.customer_id,)
) as cursor:
row = await cursor.fetchone()
if not row:
return "Customer record not found."
name, tier = row
return f"You are speaking with {name} (tier: {tier}). Address them by first name."Notice how the system prompt is a plain async function with full access to deps. You get the same ergonomics you would get from a FastAPI dependency, only the consumer is the LLM.
Step 5: Define Tools the Model Can Call
Tools are async functions decorated with @support_agent.tool. Pydantic AI inspects their type hints and generates JSON schemas the model can call. The return value is sent back into the conversation as a tool message.
from datetime import datetime
from typing import Literal
@support_agent.tool
async def get_order_status(
ctx: RunContext[SupportDeps],
order_id: int,
) -> dict:
"""Look up the latest status for a customer order.
Args:
order_id: The numeric order identifier shown on receipts.
"""
async with ctx.deps.db.execute(
"SELECT status, last_update FROM orders WHERE id = ? AND customer_id = ?",
(order_id, ctx.deps.customer_id),
) as cursor:
row = await cursor.fetchone()
if not row:
return {"error": "order_not_found", "order_id": order_id}
status, last_update = row
return {
"order_id": order_id,
"status": status,
"last_update": last_update,
"stale": (datetime.utcnow().isoformat() > last_update),
}
@support_agent.tool
async def issue_refund(
ctx: RunContext[SupportDeps],
order_id: int,
amount_cents: int,
reason: Literal["damaged", "late", "wrong_item", "other"],
) -> dict:
"""Issue a refund for a specific order. Amounts over 20000 cents require human approval."""
if amount_cents > 20_000:
return {"approved": False, "reason": "amount_exceeds_bot_limit"}
await ctx.deps.db.execute(
"INSERT INTO refunds (order_id, amount_cents, reason) VALUES (?, ?, ?)",
(order_id, amount_cents, reason),
)
await ctx.deps.db.commit()
return {"approved": True, "order_id": order_id, "amount_cents": amount_cents}Two things to notice:
- Type hints become the schema. The
Literalforreasonbecomes an enum the model must pick from. Pydantic validates each tool call before your function runs. - Tools can guard themselves. The refund tool refuses anything over 200 USD and lets the agent escalate. You do not have to teach the model the limit in prose — the tool enforces it.
Step 6: Run the Agent Against a Real Database
Wire it together in main.py:
# main.py
import asyncio
import aiosqlite
from agent import support_agent
from deps import SupportDeps
async def setup_db() -> aiosqlite.Connection:
db = await aiosqlite.connect(":memory:")
await db.executescript(
"""
CREATE TABLE customers (id INTEGER PRIMARY KEY, name TEXT, tier TEXT);
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER,
status TEXT,
last_update TEXT
);
CREATE TABLE refunds (
id INTEGER PRIMARY KEY AUTOINCREMENT,
order_id INTEGER,
amount_cents INTEGER,
reason TEXT
);
INSERT INTO customers VALUES (1, 'Aya', 'gold');
INSERT INTO orders VALUES
(4421, 1, 'shipped', '2026-04-13T08:00:00'),
(4422, 1, 'delivered', '2026-04-22T14:30:00');
"""
)
await db.commit()
return db
async def main() -> None:
db = await setup_db()
deps = SupportDeps(db=db, customer_id=1)
result = await support_agent.run(
"Order #4421 still says shipped after two weeks. Can you refund 35 USD as a goodwill credit?",
deps=deps,
)
print("Answer:", result.output.answer)
print("Needs human:", result.output.needs_human)
print("Confidence:", result.output.confidence)
print("Tool calls:", [m.kind for m in result.all_messages() if m.kind == "tool-call"])
if __name__ == "__main__":
asyncio.run(main())Run:
python main.pyThe agent will call get_order_status, see the order is stale, call issue_refund for 3500 cents, and return a SupportResponse you can pass straight to your UI layer.
Step 7: Stream the Agent Through FastAPI
Pydantic AI exposes agent.run_stream which yields incremental output. Pair it with FastAPI's StreamingResponse for a chat endpoint that streams tokens to the browser.
# api.py
from contextlib import asynccontextmanager
import aiosqlite
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from agent import support_agent
from deps import SupportDeps
class ChatBody(BaseModel):
customer_id: int
message: str
db_holder: dict = {}
@asynccontextmanager
async def lifespan(app: FastAPI):
db_holder["db"] = await aiosqlite.connect("support.db")
yield
await db_holder["db"].close()
app = FastAPI(lifespan=lifespan)
@app.post("/chat")
async def chat(body: ChatBody) -> StreamingResponse:
deps = SupportDeps(db=db_holder["db"], customer_id=body.customer_id)
async def token_stream():
async with support_agent.run_stream(body.message, deps=deps) as result:
async for chunk in result.stream_text(delta=True):
yield chunk
return StreamingResponse(token_stream(), media_type="text/plain")Start the server:
uvicorn api:app --reloadAnd from another terminal:
curl -N -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"customer_id": 1, "message": "Where is order 4422?"}'You will see tokens stream in real time. Because output_type is set, the final aggregated value is still a validated SupportResponse — you get streaming UX without giving up structure.
Step 8: Swap Providers Without Rewriting Code
The first argument to Agent is a model identifier. Change a single string and the agent runs against a different provider. For environments where the choice is dynamic, use pydantic_ai.models directly:
import os
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.models.openai import OpenAIModel
def pick_model():
name = os.getenv("AGENT_MODEL", "openai:gpt-4o-mini")
if name.startswith("openai:"):
return OpenAIModel(name.split(":", 1)[1])
if name.startswith("anthropic:"):
return AnthropicModel(name.split(":", 1)[1])
raise ValueError(f"Unknown model: {name}")
support_agent = Agent(
pick_model(),
deps_type=SupportDeps,
output_type=SupportResponse,
system_prompt="...",
)Now AGENT_MODEL=anthropic:claude-sonnet-4-6 python main.py runs the same agent on Claude Sonnet 4.6 with no other code changes. Tools, structured outputs, and dependency injection all work identically because Pydantic AI normalizes the provider differences for you.
Step 9: Add Observability with Logfire
Pydantic AI is built by the Pydantic team, so it integrates natively with Logfire — their OpenTelemetry-based observability product. Every model call, tool invocation, retry, and validation error becomes a span you can search.
Sign up at logfire.pydantic.dev and grab a write token. Then:
# observability.py
import logfire
logfire.configure(token="your-write-token", service_name="support-agent")
logfire.instrument_pydantic_ai()Import this module once at the top of api.py. Restart the server and send a few chat requests. In the Logfire UI you will see:
- A root span per agent run
- Child spans for each model call, with prompt, response, and token counts
- Tool spans showing arguments, return values, and duration
- Validation spans when Pydantic AI retries on bad output
For self-hosted observability, swap the call for logfire.configure(send_to_logfire=False) and point standard OTLP at your own collector. The instrumentation is the same.
Step 10: Test the Agent Without Burning Tokens
The pydantic_ai.models.test.TestModel lets you run end-to-end agent tests with zero network calls. It returns a deterministic structured response that matches your output_type, and you can assert on the tool calls the agent made.
# test_agent.py
import pytest
from pydantic_ai.models.test import TestModel
from agent import support_agent
from deps import SupportDeps
@pytest.mark.asyncio
async def test_refund_flow(tmp_db):
deps = SupportDeps(db=tmp_db, customer_id=1)
with support_agent.override(model=TestModel()):
result = await support_agent.run(
"Refund 50 USD for order 4421",
deps=deps,
)
tool_calls = [m for m in result.all_messages() if m.kind == "tool-call"]
assert any(t.tool_name == "issue_refund" for t in tool_calls)
assert isinstance(result.output.answer, str)
assert 0 <= result.output.confidence <= 1Add pytest and pytest-asyncio to your dev dependencies and run:
pytest -vThe whole suite finishes in milliseconds because no real LLM is involved. Use TestModel for unit tests, then layer in a small set of integration tests that hit a real provider on every release candidate.
Testing Your Implementation
Walk through the full happy path one more time:
python main.pyreturns aSupportResponsewithneeds_human=Falseand a refund recorded in SQLitecurlagainst/chatstreams tokens and ends with structured outputAGENT_MODEL=anthropic:claude-sonnet-4-6 python main.pyproduces the same shape on Claude- Logfire shows a span tree with tool calls and token usage
pytestpasses in under a second usingTestModel
If any of these fail, the most common culprits are missing API keys, an outdated provider extra (run uv pip install --upgrade "pydantic-ai[openai,anthropic]"), or a tool function that does not declare types Pydantic can introspect.
Troubleshooting
The model keeps returning prose instead of structured output. Make sure output_type is set on the Agent and that you are not also asking for free-form text in the system prompt. Pydantic AI uses tool calling under the hood; some older models need to be pinned to a function-calling capable variant.
Validation errors loop forever. Pydantic AI retries up to retries=1 by default. Bump it with Agent(..., retries=3) for flaky models, but if a field is impossible to satisfy, you will burn tokens. Read the validation error carefully — it usually points at a Field constraint that is too strict.
Tools are never called. Check that you decorated them with @agent.tool (not @agent.tool_plain unless you want to skip RunContext) and that their docstrings describe when to call them. Models rely heavily on tool descriptions to decide.
Streaming endpoint returns the whole message at once. That is FastAPI buffering. Make sure you are returning a StreamingResponse and not awaiting the generator before yielding.
Next Steps
- Combine Pydantic AI with our FastAPI Docker production guide to ship the agent behind a reverse proxy
- Pair it with Postgres full-text search on the tool side for richer retrieval
- Compare the developer experience to the Vercel AI SDK agent pattern on the TypeScript side
- Read the official Pydantic AI docs for advanced patterns: graph workflows, multi-agent handoff, and structured streaming with delta validation
Conclusion
Pydantic AI takes the parts of Python web development that already work — typed schemas, dependency injection, and async-first APIs — and applies them to LLM agents. You stop thinking about JSON parsing and prompt-shaped strings and start thinking about contracts: what does my agent return, what tools can it call, and what does it need to do its job. The result is agent code that looks like the rest of your Python codebase, tests like the rest of your Python codebase, and ships with the same confidence as the rest of your Python codebase.
Build the support bot above, instrument it with Logfire, and the next time someone asks how you handle structured LLM output you can point at a passing test suite instead of a hopeful regex.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.
Related Articles

Building AI Agents from Scratch with TypeScript: Master the ReAct Pattern Using the Vercel AI SDK
Learn how to build AI agents from the ground up using TypeScript. This tutorial covers the ReAct pattern, tool calling, multi-step reasoning, and production-ready agent loops with the Vercel AI SDK.

AI Chatbot Integration Guide: Build Intelligent Conversational Interfaces
A comprehensive guide to integrating AI chatbots into your applications using OpenAI, Anthropic Claude, and ElevenLabs. Learn to build text and voice-enabled chatbots with Next.js.

AI SDK Tutorial Hub: Your Complete Guide to Building AI Applications
Your comprehensive guide to AI SDKs and tools. Find tutorials organized by difficulty covering Vercel AI SDK, ModelFusion, OpenAI, Anthropic, and more.