Pydantic AI brings the same developer experience that made FastAPI the go-to Python web framework to the world of AI agents. Instead of wrestling with raw LLM responses and hoping strings match your expected schema, Pydantic AI gives you full type safety, automatic validation, and structured outputs — backed by the Pydantic library that already validates data in millions of production applications.

In this tutorial, you will build a Content Research Agent that searches the web, extracts key facts, and returns a fully typed research report — all in fewer than 200 lines of Python.

What You'll Build

By the end of this tutorial, you will have:

A working Pydantic AI agent with tool use and structured output
Dependency injection for reusable context (HTTP client, database, config)
Streaming support for real-time responses
Multi-turn conversation handling
A production-ready content research agent

Prerequisites

Before you begin, make sure you have:

Python 3.11 or later installed
Basic familiarity with Python type hints and async/await
An API key from OpenAI, Anthropic, or any supported provider
pip or uv for package management

Why Pydantic AI?

Most LLM frameworks treat structured output as an afterthought — you get raw JSON strings and parse them yourself, hoping the model didn't hallucinate extra fields. Pydantic AI solves this at the framework level:

Type-safe responses: Define a Pydantic model, get a validated instance back. No manual parsing.
Automatic retry: If the LLM returns invalid JSON or wrong field types, Pydantic AI retries with the validation error fed back to the model.
Tool-first design: Register Python functions as tools with a single decorator. Arguments are validated automatically.
Provider-agnostic: Switch between OpenAI, Anthropic, Gemini, Mistral, Ollama, and dozens more by changing a single string.
Logfire native: First-class observability with zero config if you use Pydantic's Logfire.

Step 1: Install Pydantic AI

Install with your preferred provider. The openai extra includes the OpenAI client:

pip install 'pydantic-ai[openai]'
# or for Anthropic
pip install 'pydantic-ai[anthropic]'
# or for everything
pip install 'pydantic-ai[all]'

Using uv (recommended):

uv add 'pydantic-ai[openai]'

Set your API key in the environment:

export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."

Step 2: Your First Agent

The Agent class is the core primitive. You declare the model, instructions, and optionally an output_type:

from pydantic_ai import Agent
 
agent = Agent(
    'openai:gpt-4o',
    instructions='You are a helpful assistant. Be concise and accurate.',
)
 
result = agent.run_sync('What is the capital of Tunisia?')
print(result.output)  # "The capital of Tunisia is Tunis."

For async code (recommended in production), use await agent.run(...):

import asyncio
from pydantic_ai import Agent
 
agent = Agent('openai:gpt-4o', instructions='Be concise.')
 
async def main():
    result = await agent.run('Explain asyncio in one sentence.')
    print(result.output)
 
asyncio.run(main())

Switching providers is a one-line change:

# OpenAI
agent = Agent('openai:gpt-4o')
 
# Anthropic
agent = Agent('anthropic:claude-sonnet-4-6')
 
# Gemini
agent = Agent('google-gla:gemini-2.5-flash')
 
# Local Ollama
agent = Agent('ollama:llama3.2')

Step 3: Structured Outputs with Pydantic Models

This is where Pydantic AI shines. Pass any Pydantic model as output_type and the agent guarantees a validated instance back:

from pydantic import BaseModel
from pydantic_ai import Agent
 
class MovieReview(BaseModel):
    title: str
    year: int
    rating: float  # 0.0 to 10.0
    summary: str
    pros: list[str]
    cons: list[str]
    recommended: bool
 
agent = Agent(
    'openai:gpt-4o',
    output_type=MovieReview,
    instructions='You are a film critic. Return structured movie reviews.',
)
 
result = agent.run_sync('Review the movie Inception.')
review = result.output  # type: MovieReview — fully validated
 
print(f"{review.title} ({review.year}) — {review.rating}/10")
print(f"Recommended: {review.recommended}")
print("Pros:", review.pros)

If the model returns an invalid response (wrong types, missing fields), Pydantic AI automatically sends the validation error back to the model and requests a corrected response — up to the request_limit you configure.

Union outputs for flexible schemas

You can express "either this or that" output with a Union type:

from pydantic import BaseModel
from pydantic_ai import Agent
from typing import Union
 
class SuccessResponse(BaseModel):
    result: str
    confidence: float
 
class ErrorResponse(BaseModel):
    error: str
    reason: str
 
agent = Agent(
    'openai:gpt-4o',
    output_type=Union[SuccessResponse, ErrorResponse],
)

Step 4: Function Tools

Tools let the LLM call Python functions during a conversation. Register them with @agent.tool (with agent context) or @agent.tool_plain (without):

import httpx
from pydantic_ai import Agent, RunContext
 
agent = Agent(
    'openai:gpt-4o',
    instructions='You are a weather assistant. Use tools to fetch real data.',
)
 
@agent.tool_plain
async def get_weather(city: str) -> str:
    """Fetch the current weather for a given city."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://wttr.in/{city}?format=3'
        )
        return response.text
 
async def main():
    result = await agent.run('What is the weather in Tunis right now?')
    print(result.output)
 
asyncio.run(main())

The docstring becomes the tool description sent to the LLM. Function parameters are automatically converted to a JSON schema for the model.

Tools with RunContext (dependency injection)

When tools need access to shared resources (database, HTTP client, config), use @agent.tool with RunContext:

from dataclasses import dataclass
import httpx
from pydantic_ai import Agent, RunContext
 
@dataclass
class ResearchDeps:
    http_client: httpx.AsyncClient
    api_key: str
    max_results: int = 5
 
agent = Agent(
    'anthropic:claude-sonnet-4-6',
    deps_type=ResearchDeps,
    instructions='You are a research assistant with web search capabilities.',
)
 
@agent.tool
async def search_web(ctx: RunContext[ResearchDeps], query: str) -> list[str]:
    """Search the web and return a list of relevant result snippets."""
    response = await ctx.deps.http_client.get(
        'https://api.search.example.com/search',
        params={'q': query, 'limit': ctx.deps.max_results},
        headers={'Authorization': f'Bearer {ctx.deps.api_key}'},
    )
    results = response.json()
    return [r['snippet'] for r in results.get('items', [])]
 
@agent.tool
async def fetch_page(ctx: RunContext[ResearchDeps], url: str) -> str:
    """Fetch the text content of a webpage."""
    response = await ctx.deps.http_client.get(url, follow_redirects=True)
    # Strip HTML tags in production — simplified here
    return response.text[:2000]

The RunContext carries your dependencies into every tool call without global state or hidden coupling.

Step 5: Running Agents with Dependencies

Pass dependencies at run time using the deps argument:

async def main():
    async with httpx.AsyncClient() as client:
        deps = ResearchDeps(
            http_client=client,
            api_key='your-search-api-key',
            max_results=3,
        )
        result = await agent.run(
            'Research the latest developments in quantum computing.',
            deps=deps,
        )
        print(result.output)

Step 6: Streaming Responses

For long-running agent tasks, stream the output as it arrives:

from pydantic_ai import Agent
 
agent = Agent('openai:gpt-4o', instructions='Write detailed technical content.')
 
async def stream_demo():
    async with agent.run_stream('Explain how TCP/IP works.') as stream:
        async for text_chunk in stream.stream_text():
            print(text_chunk, end='', flush=True)
        print()  # newline at end
        result = await stream.get_output()
 
asyncio.run(stream_demo())

Streaming structured outputs

You can also stream and validate a structured output incrementally:

from pydantic import BaseModel
 
class ResearchReport(BaseModel):
    topic: str
    summary: str
    key_findings: list[str]
    sources: list[str]
    confidence_score: float
 
agent = Agent(
    'openai:gpt-4o',
    output_type=ResearchReport,
)
 
async def stream_structured():
    async with agent.run_stream('Research serverless computing trends.') as stream:
        # Stream partial text while the model writes
        async for chunk in stream.stream_text(delta=True):
            print(chunk, end='', flush=True)
        # Get the validated structured result at completion
        report: ResearchReport = await stream.get_output()
        print(f"\nTopic: {report.topic}")
        print(f"Confidence: {report.confidence_score}")

Step 7: Multi-Turn Conversations

Pydantic AI tracks conversation history automatically. Use message_history to continue a conversation:

from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessagesTypeAdapter
 
agent = Agent('openai:gpt-4o', instructions='You are a coding tutor.')
 
async def multi_turn():
    # First turn
    result1 = await agent.run('What is a Python decorator?')
    print('Turn 1:', result1.output)
 
    # Second turn — passes previous messages as context
    result2 = await agent.run(
        'Show me a practical example of that.',
        message_history=result1.new_messages(),
    )
    print('Turn 2:', result2.output)
 
    # Third turn
    result3 = await agent.run(
        'How would I use this in a FastAPI application?',
        message_history=result2.all_messages(),
    )
    print('Turn 3:', result3.output)
 
asyncio.run(multi_turn())

result.new_messages() returns only the new exchange; result.all_messages() returns the full history including the original.

Step 8: Controlling Agent Behaviour

Limit how many model turns and tool calls an agent can make per run:

from pydantic_ai import Agent
from pydantic_ai.usage import UsageLimits
 
agent = Agent('openai:gpt-4o')
 
result = await agent.run(
    'Research and summarize 5 recent AI papers.',
    usage_limits=UsageLimits(
        request_limit=10,        # max LLM turns
        tool_calls_limit=20,     # max tool executions
    ),
)

When the model calls multiple tools in a single response, Pydantic AI executes them concurrently using asyncio.create_task. For tools that must run sequentially, mark them:

@agent.tool(sequential=True)
async def write_to_db(ctx: RunContext[MyDeps], data: dict) -> str:
    """Write data to the database — must not run concurrently."""
    await ctx.deps.db.insert(data)
    return 'written'

Step 9: Full Example — Content Research Agent

Putting it all together: a production-ready content research agent that accepts a topic, searches for information, and returns a typed ResearchReport.

import asyncio
import httpx
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
 
# --- Data models ---
 
@dataclass
class Deps:
    client: httpx.AsyncClient
    brave_api_key: str | None = None  # optional — falls back to mock
 
class ArticleSummary(BaseModel):
    title: str
    url: str
    key_points: list[str] = Field(min_length=1, max_length=5)
 
class ResearchReport(BaseModel):
    topic: str
    executive_summary: str = Field(min_length=50)
    articles: list[ArticleSummary] = Field(min_length=1, max_length=5)
    key_trends: list[str] = Field(min_length=3, max_length=10)
    recommended_actions: list[str] = Field(min_length=1, max_length=5)
    confidence_score: float = Field(ge=0.0, le=1.0)
 
# --- Agent definition ---
 
research_agent = Agent(
    'anthropic:claude-sonnet-4-6',
    deps_type=Deps,
    output_type=ResearchReport,
    instructions="""
    You are an expert research analyst. When given a topic:
    1. Use the search tool to find recent information (2-3 searches)
    2. Use fetch_article to get full content from the most relevant results
    3. Synthesize findings into a structured ResearchReport
    Be thorough but concise. Focus on actionable insights.
    """,
)
 
# --- Tools ---
 
@research_agent.tool
async def search(ctx: RunContext[Deps], query: str) -> list[dict]:
    """Search the web for recent information on a topic."""
    if ctx.deps.brave_api_key:
        resp = await ctx.deps.client.get(
            'https://api.search.brave.com/res/v1/web/search',
            headers={'Accept': 'application/json', 'X-Subscription-Token': ctx.deps.brave_api_key},
            params={'q': query, 'count': 5, 'freshness': 'pw'},
        )
        resp.raise_for_status()
        data = resp.json()
        return [
            {'title': r['title'], 'url': r['url'], 'description': r.get('description', '')}
            for r in data.get('web', {}).get('results', [])
        ]
    # Mock response for demo purposes
    return [
        {'title': f'Article about {query}', 'url': f'https://example.com/{query.replace(" ", "-")}', 'description': f'Comprehensive coverage of {query}'},
    ]
 
@research_agent.tool
async def fetch_article(ctx: RunContext[Deps], url: str) -> str:
    """Fetch and return the text content of an article."""
    try:
        resp = await ctx.deps.client.get(url, timeout=10.0, follow_redirects=True)
        resp.raise_for_status()
        # In production, use a proper HTML parser (beautifulsoup4, trafilatura)
        text = resp.text
        # Return first 3000 chars to stay within context limits
        return text[:3000]
    except Exception as e:
        return f'Could not fetch article: {e}'
 
# --- Runner ---
 
async def research(topic: str) -> ResearchReport:
    async with httpx.AsyncClient(timeout=30.0) as client:
        deps = Deps(client=client)
        result = await research_agent.run(
            f'Research this topic in depth: {topic}',
            deps=deps,
        )
        return result.output
 
async def main():
    report = await research('AI agent frameworks in Python 2026')
 
    print(f'Topic: {report.topic}')
    print(f'Confidence: {report.confidence_score:.0%}')
    print(f'\nExecutive Summary:\n{report.executive_summary}')
    print(f'\nKey Trends:')
    for trend in report.key_trends:
        print(f'  - {trend}')
    print(f'\nArticles reviewed: {len(report.articles)}')
    print(f'\nRecommended Actions:')
    for action in report.recommended_actions:
        print(f'  - {action}')
 
if __name__ == '__main__':
    asyncio.run(main())

Step 10: Multi-Model Fallback

For MENA markets where latency or service availability varies, configure a fallback chain using multiple providers:

from pydantic_ai import Agent
from pydantic_ai.models.fallback import FallbackModel
 
# Try Claude first, fall back to GPT-4o, then Gemini
fallback_model = FallbackModel(
    'anthropic:claude-sonnet-4-6',
    'openai:gpt-4o',
    'google-gla:gemini-2.5-flash',
)
 
agent = Agent(fallback_model, output_type=ResearchReport)

Step 11: Observability with Logfire

Pydantic AI integrates natively with Logfire for production monitoring:

pip install logfire
logfire auth

import logfire
from pydantic_ai import Agent
 
logfire.configure()
logfire.instrument_pydantic_ai()  # auto-instruments all agents
 
agent = Agent('openai:gpt-4o')
result = agent.run_sync('Hello!')
# Every model call, tool execution, and validation is traced automatically

You can view traces in the Logfire dashboard, including token usage, latency, tool call sequences, and validation failures.

Testing Your Agent

Pydantic AI provides a TestModel and FunctionModel for unit testing without calling a real LLM:

import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
 
def test_research_agent_structure():
    with research_agent.override(model=TestModel()):
        result = research_agent.run_sync(
            'Test topic',
            deps=Deps(client=None),  # TestModel never calls tools
        )
    assert isinstance(result.output, ResearchReport)

For integration tests, use FunctionModel to inject controlled responses:

from pydantic_ai.models.function import FunctionModel, ModelContext
 
def my_model(messages, info):
    return ResearchReport(
        topic='Test',
        executive_summary='A' * 50,
        articles=[],
        key_trends=['trend1', 'trend2', 'trend3'],
        recommended_actions=['action1'],
        confidence_score=0.9,
    )
 
with research_agent.override(model=FunctionModel(my_model)):
    result = research_agent.run_sync('test', deps=Deps(client=None))
    assert result.output.confidence_score == 0.9

Troubleshooting

ValidationError loops: If the model keeps returning invalid data, add stricter instructions or simplify your Pydantic model. The default request_limit is 10 — increase it for complex schemas.

Tool not being called: Make sure the docstring clearly describes when the tool should be used. The LLM decides when to call tools based on the description.

Provider errors: Pydantic AI raises ModelHTTPError for API failures. Wrap agent.run() in a try/except and implement exponential backoff for production use.

Async in sync code: Use agent.run_sync() sparingly — in web frameworks (FastAPI, Django async views), always use await agent.run().

Next Steps

Explore multi-agent workflows where one agent delegates to specialist sub-agents
Add Logfire tracing for production observability
Integrate Pydantic AI with FastAPI for type-safe LLM API endpoints
Use model-graded evals to test agent quality at scale
Try streaming with structured output for interactive research tools

Conclusion

Pydantic AI removes the boilerplate that makes LLM applications fragile and hard to maintain. By treating validation, tool use, and provider abstraction as first-class citizens, it lets you focus on what your agent should do — not on parsing strings and praying the LLM followed your schema. Whether you are building a research assistant, a document processor, or a customer support bot, Pydantic AI gives you the confidence to ship type-safe AI agents to production.