Building a single AI agent is easy. Building a system of agents that plan, delegate, remember, and run reliably in production is where most projects stall. Agno — the framework formerly known as Phidata — was built precisely for that gap. Agents instantiate in microseconds, use a fraction of the memory of heavier frameworks, and ship with batteries included: memory, knowledge bases, structured output, and a production runtime called AgentOS that wraps everything behind a FastAPI server.

In this tutorial you'll build a research assistant that grows from a single agent into a coordinated team, then into a multi-step workflow, and finally into a deployable API. By the end you'll understand the four core Agno primitives — Agent, Team, Workflow, and AgentOS — and how they compose.

Prerequisites

Before starting, ensure you have:

Python 3.10+ installed (python --version)
Basic familiarity with Python and async/await
An OpenAI API key (or Anthropic — Agno is model-agnostic)
A terminal and a code editor (VS Code recommended)

You should be comfortable with virtual environments and reading Pydantic models. No prior agent-framework experience is required.

What You'll Build

A progressively richer research assistant:

A single agent that searches the web and answers questions with sources.
A team where a leader delegates to specialist agents (web research + analysis).
A workflow that runs research and analysis in parallel, loops until it has enough material, then writes a report.
An AgentOS deployment exposing all of the above as a REST API with persistent sessions.

Step 1: Project Setup

Create a project folder and a virtual environment:

mkdir agno-research && cd agno-research
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate

Install Agno along with a model provider and the tools we'll use:

pip install -U agno openai ddgs lancedb tantivy

A quick note on what each package does:

agno — the framework itself
openai — the model provider client
ddgs — powers WebSearchTools (DuckDuckGo search, no API key needed)
lancedb and tantivy — the embedded vector database for knowledge bases (Step 5)

Export your API key:

export OPENAI_API_KEY="sk-..."

Tip: Agno is genuinely model-agnostic. To use Claude instead, run pip install anthropic, set ANTHROPIC_API_KEY, and swap OpenAIChat(id="gpt-4o-mini") for Claude(id="claude-sonnet-4-0"). Everything else in this tutorial stays identical.

Step 2: Your First Agent

An Agno Agent is the atomic unit: a model, a set of instructions, and optionally some tools. Create agent.py:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.websearch import WebSearchTools
 
web_agent = Agent(
    name="Web Researcher",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[WebSearchTools()],
    instructions=[
        "You are a meticulous research assistant.",
        "Always search the web before answering factual questions.",
        "Cite your sources as a bulleted list at the end.",
    ],
    add_datetime_to_context=True,
    markdown=True,
)
 
web_agent.print_response(
    "What were the biggest open-source AI agent frameworks released in 2026?",
    stream=True,
)

Run it:

python agent.py

You'll see the agent reason, call the search tool, and stream a formatted, sourced answer to your terminal. A few things worth highlighting:

instructions is a list of short directives. Agno assembles them into the system prompt — keep them imperative and specific.
tools is just a list of tool objects. The model decides when to call them; you never write the orchestration logic by hand.
add_datetime_to_context=True injects the current date, which keeps "latest" queries grounded.
print_response(..., stream=True) is the quickest way to see output during development.

Running agents programmatically

print_response is for humans. In real code, use run() to get a response object back:

response = web_agent.run("Summarize the Agno framework in two sentences.")
print(response.content)        # the text answer

For high-throughput services, every method has an async twin — arun() and aprint_response() — so you can await many agents concurrently.

Step 3: Structured Output with Pydantic

LLM text is hard to consume programmatically. Agno can force an agent to return a typed object by passing a Pydantic model as output_schema. Create structured.py:

from typing import List
from pydantic import BaseModel, Field
 
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.websearch import WebSearchTools
 
 
class TopicBriefing(BaseModel):
    topic: str = Field(..., description="The subject being researched")
    summary: str = Field(..., description="A two-sentence overview")
    key_points: List[str] = Field(..., description="3-5 essential takeaways")
    sources: List[str] = Field(..., description="URLs used for the research")
 
 
briefing_agent = Agent(
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[WebSearchTools()],
    output_schema=TopicBriefing,
)
 
response = briefing_agent.run("Research the rise of small language models in 2026")
briefing: TopicBriefing = response.content
 
print(briefing.topic)
for point in briefing.key_points:
    print(f"- {point}")
print("Sources:", ", ".join(briefing.sources))

Because response.content is now a real TopicBriefing instance, you get autocomplete, validation, and zero brittle string parsing. This single feature makes Agno agents safe to embed inside larger applications.

Step 4: Coordinating a Team

One agent is limited. A Team lets a leader model delegate to specialist members, each with its own role and tools. The leader decides who handles what and synthesizes the final answer.

Create team.py:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.team import Team
from agno.tools.websearch import WebSearchTools
from agno.tools.hackernews import HackerNewsTools
 
web_agent = Agent(
    name="Web Agent",
    role="Search the broad web for background and context",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[WebSearchTools()],
    instructions="Always include sources.",
)
 
tech_pulse_agent = Agent(
    name="Tech Pulse Agent",
    role="Gauge developer sentiment from Hacker News discussions",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[HackerNewsTools()],
    instructions="Summarize what practitioners actually think, not just headlines.",
)
 
research_team = Team(
    name="Research Team",
    model=OpenAIChat(id="gpt-4o"),
    members=[web_agent, tech_pulse_agent],
    instructions=[
        "You coordinate a research team.",
        "Delegate broad context to the Web Agent.",
        "Delegate community sentiment to the Tech Pulse Agent.",
        "Combine both into a balanced briefing with a clear verdict.",
    ],
    show_members_responses=True,
    markdown=True,
)
 
research_team.print_response(
    "Should a startup adopt Agno for its agent stack in 2026?",
    stream=True,
)

Key ideas:

Each member has a role — a one-line description the leader uses to route work. Write roles as job descriptions, not instructions.
The leader (research_team) usually gets a stronger model than the members, since coordination and synthesis are harder than individual lookups.
show_members_responses=True surfaces each member's contribution, which is invaluable while debugging delegation.

Run it and watch the leader fan out to both specialists, then merge their findings into a single verdict. You wrote zero routing code — the team leader handles delegation through tool calls under the hood.

Step 5: Adding Memory and Knowledge

Real assistants remember past conversations and can ground answers in your own documents. Agno handles both through a database (for memory and sessions) and a Knowledge base (for retrieval).

Create memory_knowledge.py:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.db.sqlite import SqliteDb
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.embedder.openai import OpenAIEmbedder
from agno.vectordb.lancedb import LanceDb, SearchType
 
# Persistent store for sessions and long-term user memory
db = SqliteDb(db_file="tmp/research.db")
 
# A knowledge base backed by an embedded LanceDB vector store
knowledge = Knowledge(
    vector_db=LanceDb(
        uri="tmp/lancedb",
        table_name="research_docs",
        search_type=SearchType.hybrid,
        embedder=OpenAIEmbedder(id="text-embedding-3-small"),
    ),
)
 
# Ingest a document once; it is chunked, embedded, and indexed automatically
knowledge.insert(url="https://www.paulgraham.com/read.html")
 
assistant = Agent(
    name="Grounded Assistant",
    model=OpenAIChat(id="gpt-4o-mini"),
    db=db,
    knowledge=knowledge,
    search_knowledge=True,          # let the agent retrieve from the knowledge base
    add_history_to_context=True,    # include recent turns in the prompt
    num_history_runs=3,             # how many past exchanges to include
    update_memory_on_run=True,      # learn durable facts about the user
    markdown=True,
)
 
# Pass a stable user_id and session_id to thread memory across runs
assistant.print_response(
    "According to the essay I added, why does reading matter?",
    user_id="anis@example.com",
    session_id="session-1",
)

What each piece does:

SqliteDb persists sessions and memories to a local file. In production you'd swap it for PostgresDb — the agent code is unchanged.
Knowledge + LanceDb gives you hybrid (vector + keyword) search over ingested content. search_knowledge=True tells the agent it may retrieve from it.
add_history_to_context and num_history_runs control conversational memory within a session.
update_memory_on_run lets the agent extract durable facts ("the user prefers concise answers") and recall them in future sessions tied to the same user_id.

This is the moment a demo becomes an assistant: it remembers who it's talking to and can cite your documents.

Step 6: Orchestrating a Workflow

Teams are great when you want the model to decide who does what. But sometimes you need deterministic control flow — run these steps in parallel, loop until a quality bar is met, branch on a condition. That's what a Workflow is for.

Create workflow.py:

from typing import List
 
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.websearch import WebSearchTools
from agno.tools.hackernews import HackerNewsTools
from agno.workflow import Loop, Parallel, Step, Workflow
from agno.workflow.types import StepOutput
 
# --- Agents ---
web_researcher = Agent(
    name="Web Researcher",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[WebSearchTools()],
    instructions="Research the topic thoroughly from web sources.",
)
hn_researcher = Agent(
    name="HN Researcher",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[HackerNewsTools()],
    instructions="Surface what developers are saying about the topic.",
)
writer = Agent(
    name="Report Writer",
    model=OpenAIChat(id="gpt-4o"),
    instructions="Write a concise, well-structured report from the research provided.",
    markdown=True,
)
 
# --- Steps ---
research_web = Step(name="Research Web", agent=web_researcher)
research_hn = Step(name="Research HN", agent=hn_researcher)
write_report = Step(name="Write Report", agent=writer)
 
 
# --- Loop exit condition: keep researching until we have enough material ---
def enough_research(outputs: List[StepOutput]) -> bool:
    total = sum(len(o.content or "") for o in outputs)
    return total > 1500
 
 
# --- Workflow: parallel research in a loop, then write ---
workflow = Workflow(
    name="Deep Research Workflow",
    description="Research a topic in parallel until sufficient, then write a report",
    steps=[
        Loop(
            name="Research Loop",
            steps=[Parallel(research_web, research_hn, name="Parallel Research")],
            end_condition=enough_research,
            max_iterations=3,
        ),
        write_report,
    ],
)
 
if __name__ == "__main__":
    workflow.print_response(
        input="The state of TypeScript-first AI agent frameworks in 2026",
        stream=True,
    )

The building blocks Agno gives you for workflows:

Primitive	Purpose
`Step`	Run one agent (or function) as a unit
`Parallel`	Execute several steps concurrently
`Loop`	Repeat steps until `end_condition` is true or `max_iterations` is hit
`Router`	Pick a branch dynamically via a selector function
`Condition`	Run steps only when an expression is true (with optional `else_steps`)

Here the two researchers run at the same time inside a Loop that keeps going until the combined output crosses 1500 characters (or three iterations pass), after which the writer produces the final report. Unlike a Team, the control flow is fully deterministic and testable — you can unit-test enough_research() in isolation.

Branching with Condition

For routing, Agno even supports expression-based conditions:

from agno.workflow import Condition, Step, Workflow
 
workflow = Workflow(
    name="Classify and Route",
    steps=[
        Step(name="Classify", agent=classifier),
        Condition(
            name="Route by Type",
            evaluator='previous_step_content.contains("TECHNICAL")',
            steps=[Step(name="Technical Help", agent=technical_agent)],
            else_steps=[Step(name="General Help", agent=general_agent)],
        ),
    ],
)

The classifier labels the request, and the Condition sends it down the right branch — no glue code in between.

Step 7: Deploying with AgentOS

A working script isn't a product. AgentOS wraps your agents, teams, and workflows in a pre-built FastAPI application — complete with session storage, conversation history, and monitoring endpoints — that you run in your own infrastructure.

Create serve.py:

from agno.agent import Agent
from agno.team import Team
from agno.workflow import Step, Workflow
from agno.models.openai import OpenAIChat
from agno.db.sqlite import SqliteDb
from agno.tools.websearch import WebSearchTools
from agno.os import AgentOS
 
db = SqliteDb(db_file="tmp/agentos.db")
 
researcher = Agent(
    name="Researcher",
    model=OpenAIChat(id="gpt-4o-mini"),
    db=db,
    tools=[WebSearchTools()],
    instructions="Research thoroughly and cite sources.",
    add_history_to_context=True,
    markdown=True,
)
 
research_team = Team(
    name="Research Team",
    model=OpenAIChat(id="gpt-4o"),
    db=db,
    members=[researcher],
    instructions="Coordinate research and deliver a clear briefing.",
)
 
qa_workflow = Workflow(
    name="QA Workflow",
    description="Answer a question using the researcher agent",
    db=db,
    steps=[Step(name="Answer", agent=researcher)],
)
 
agent_os = AgentOS(
    description="Research AgentOS",
    agents=[researcher],
    teams=[research_team],
    workflows=[qa_workflow],
)
 
# FastAPI app instance — uvicorn looks for `app`
app = agent_os.get_app()
 
if __name__ == "__main__":
    agent_os.serve(app="serve:app", reload=True)

Start the server:

python serve.py

AgentOS boots a FastAPI server (default http://localhost:7777) with auto-generated REST endpoints for every agent, team, and workflow you registered, plus interactive docs at /docs. Because each component shares the same db, sessions and memory are persisted automatically — your agents remember conversations across HTTP requests.

The runtime is stateless and horizontally scalable: run multiple instances behind a load balancer and point them at a shared Postgres database. This is the "missing piece" Agno is built around — the bridge from a working prototype to a deployed product.

Testing Your Implementation

Verify each layer in order:

Agent — python agent.py streams a sourced answer.
Structured output — python structured.py prints typed fields without errors.
Team — python team.py shows both members contributing, then a merged verdict.
Memory — run memory_knowledge.py twice with the same user_id; the second run should recall context.
Workflow — python workflow.py runs parallel research inside a loop before writing.
AgentOS — open http://localhost:7777/docs and call an endpoint; confirm the response and that a session row appears in tmp/agentos.db.

Troubleshooting

ModuleNotFoundError: No module named 'ddgs' — WebSearchTools needs the search backend. Run pip install ddgs.

Empty or refused tool calls — your model may be too small to reason about tools reliably. Move the leader/orchestrator to a stronger model (gpt-4o, claude-sonnet-4-0) and keep the cheap model only for simple member roles.

Knowledge base returns nothing — confirm search_knowledge=True on the agent and that knowledge.insert(...) ran successfully before the query. Hybrid search also needs tantivy installed.

Memory doesn't persist — make sure you pass a stable user_id and session_id, that the agent has a db, and that update_memory_on_run=True and/or add_history_to_context=True are set.

AgentOS won't start with serve() — the app= string must be "<module>:app" matching your filename. In serve.py it's "serve:app".

Next Steps

Swap SqliteDb for PostgresDb and deploy AgentOS behind a load balancer.
Add Router steps to choose research strategies dynamically per topic.
Connect external tools via MCP using Agno's MCPTools class.
Add evaluation with Agno's ReliabilityEval to catch regressions in agent behavior.
Explore related tutorials on our site: CrewAI multi-agent systems, Pydantic AI type-safe agents, and smolagents code-first agents.

Conclusion

You've taken a research assistant from a single tool-using agent all the way to a horizontally scalable API. Along the way you met the four Agno primitives that compose into almost any agentic system: the Agent for individual capability, the Team for model-driven delegation, the Workflow for deterministic orchestration, and AgentOS for production deployment. Agno's defining strengths — microsecond instantiation, low memory footprint, and a batteries-included runtime — mean the system you prototyped this afternoon is the same one you ship. Start small with one agent, and reach for teams and workflows only when the problem genuinely demands them.