writing/tutorial/2026/05
TutorialMay 23, 2026·30 min read

CrewAI Tutorial 2026: Build Production Multi-Agent AI Systems in Python

Learn how to design role-based AI agent crews that collaborate on real research, writing, and analysis tasks. This hands-on tutorial walks you through installing CrewAI, wiring tools, orchestrating sequential and hierarchical processes, adding memory, and deploying a production-grade crew with observability.

Single-agent loops are great until your problem actually has multiple specialists in it. Research, writing, code review, financial analysis — these are team sports, not solo acts. CrewAI takes that intuition and turns it into a programmable abstraction: you define a crew of role-based agents, hand each one a goal and a backstory, give them tools, and let them collaborate sequentially or hierarchically toward a shared objective.

Unlike graph-based frameworks like LangGraph or single-agent runners like the OpenAI Agents SDK, CrewAI treats orchestration as an organizational chart. In this tutorial you will install CrewAI from scratch, build a four-agent market research crew, give it browsing and file tools, switch between sequential and hierarchical processes, add long-term memory, and instrument the whole thing for production use.

Prerequisites

Before starting, make sure you have:

  • Python 3.10 or newer installed
  • An OpenAI or Anthropic API key (or a local Ollama install)
  • A free Serper.dev key for web search (optional but recommended)
  • Basic familiarity with Python classes, decorators, and async/await
  • A terminal and a code editor (VS Code with the Python extension recommended)

What You Will Build

By the end of this tutorial, you will have:

  1. A CrewAI project scaffolded with uv and the official CLI
  2. A four-agent crew — Researcher, Analyst, Writer, Editor — that produces a polished market brief
  3. Web search and file-writing tools wired into the agents
  4. A sequential pipeline and a hierarchical manager variant of the same crew
  5. Short-term and long-term memory enabled with embeddings
  6. Custom callbacks that stream agent thoughts and tool calls
  7. A FastAPI endpoint that triggers the crew on demand
  8. Observability through CrewAI Plus telemetry and structured logging

Step 1: Install CrewAI

CrewAI ships as a single package with optional extras for tools, embeddings, and deployment helpers. Create an isolated environment first — uv is the fastest option in 2026.

mkdir market-crew && cd market-crew
uv venv --python 3.11
source .venv/bin/activate
uv pip install "crewai[tools]" python-dotenv fastapi uvicorn

If you prefer plain pip:

python -m venv .venv
source .venv/bin/activate
pip install "crewai[tools]" python-dotenv fastapi uvicorn

Verify the install:

crewai version
# CrewAI version: 0.86.x

Step 2: Scaffold the Project

CrewAI ships a CLI that mirrors the layout the framework expects. Run the generator from the parent directory.

cd .. && crewai create crew market-crew

When prompted, pick openai as the provider and gpt-4o-mini as the model. The CLI lays down this structure:

market-crew/
├── src/
│   └── market_crew/
│       ├── config/
│       │   ├── agents.yaml
│       │   └── tasks.yaml
│       ├── crew.py
│       ├── main.py
│       └── tools/
├── .env
├── pyproject.toml
└── README.md

The two YAML files are where most of your work will live. agents.yaml defines roles and goals; tasks.yaml defines what each agent has to produce.

Add your secrets to .env:

OPENAI_API_KEY=sk-...
SERPER_API_KEY=...      # from serper.dev, optional
MODEL=gpt-4o-mini

Step 3: Define the Agents

Open src/market_crew/config/agents.yaml and replace the placeholder content with four specialist roles.

researcher:
  role: >
    Senior Market Researcher specializing in {industry}
  goal: >
    Gather the most recent, credible facts, statistics, and quotes about
    {topic} from public web sources published in the last 12 months.
  backstory: >
    You spent a decade at McKinsey running primary and secondary research.
    You triangulate every claim across at least two independent sources
    and you tag everything with its publication date and URL.
  verbose: true
  allow_delegation: false
 
analyst:
  role: >
    Strategic Analyst for {industry}
  goal: >
    Turn the researcher's raw notes into a SWOT, a competitive matrix,
    and three actionable recommendations for {target_audience}.
  backstory: >
    You write the kind of two-page memos that get read by CEOs. You
    refuse to include any claim that is not backed by a cited source
    from the researcher's notes.
  verbose: true
 
writer:
  role: >
    Senior Editorial Writer
  goal: >
    Produce a 1200-word executive brief on {topic} that an investor
    could read in under six minutes and forward to their partners.
  backstory: >
    You used to write the Stratechery weekly. You favor short sentences,
    concrete examples, and exactly one strong thesis per piece.
  verbose: true
 
editor:
  role: >
    Chief Editor
  goal: >
    Fact-check the brief against the researcher's source list, fix
    weak prose, and ensure every claim cites its source.
  backstory: >
    You ran the standards desk at The Economist for eight years. You
    do not let a single uncited number pass.
  verbose: true

CrewAI interpolates the {industry}, {topic}, and {target_audience} placeholders at runtime from the inputs you pass to the crew.

Step 4: Define the Tasks

Edit src/market_crew/config/tasks.yaml:

research_task:
  description: >
    Collect at least 12 high-quality findings about {topic} in the
    {industry} sector. For each finding, capture: the claim, the
    source URL, the publication date, and a one-sentence summary.
  expected_output: >
    A markdown list of 12 or more bulleted findings. Each bullet
    must include the source URL and date. No paraphrasing without
    a citation.
  agent: researcher
 
analysis_task:
  description: >
    Using the researcher's findings, produce a SWOT analysis for a
    {target_audience} entering the {topic} market, a 5-row competitive
    matrix, and exactly three recommendations ranked by impact.
  expected_output: >
    A markdown document with three sections: SWOT, Competitive Matrix,
    Recommendations. Every numerical claim must cite a finding by URL.
  agent: analyst
  context:
    - research_task
 
writing_task:
  description: >
    Write a 1200-word executive brief on {topic} aimed at a
    {target_audience}. Open with a single-sentence thesis. Use the
    analyst's SWOT and recommendations. End with a "What to watch"
    section listing three forward indicators.
  expected_output: >
    A polished markdown brief, 1100 to 1300 words, with H2 section
    headings and inline citations as markdown links.
  agent: writer
  context:
    - research_task
    - analysis_task
 
editing_task:
  description: >
    Edit the brief for clarity and accuracy. Every numeric claim must
    cite a source from the researcher's list. Flag any unsupported
    claim by replacing it with a TODO comment.
  expected_output: >
    The final markdown brief, saved to `output/brief.md`.
  agent: editor
  context:
    - research_task
    - analysis_task
    - writing_task
  output_file: output/brief.md

The context field is where CrewAI's collaboration model lives. Each task receives the outputs of its dependencies as input — no manual prompt stitching required.

Step 5: Wire the Crew

Open src/market_crew/crew.py. The CLI generated a skeleton; replace it with the version below.

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import SerperDevTool, FileWriterTool
 
@CrewBase
class MarketCrew:
    """Market research crew with four specialists."""
 
    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"
 
    @agent
    def researcher(self) -> Agent:
        return Agent(
            config=self.agents_config["researcher"],
            tools=[SerperDevTool()],
        )
 
    @agent
    def analyst(self) -> Agent:
        return Agent(config=self.agents_config["analyst"])
 
    @agent
    def writer(self) -> Agent:
        return Agent(config=self.agents_config["writer"])
 
    @agent
    def editor(self) -> Agent:
        return Agent(
            config=self.agents_config["editor"],
            tools=[FileWriterTool()],
        )
 
    @task
    def research_task(self) -> Task:
        return Task(config=self.tasks_config["research_task"])
 
    @task
    def analysis_task(self) -> Task:
        return Task(config=self.tasks_config["analysis_task"])
 
    @task
    def writing_task(self) -> Task:
        return Task(config=self.tasks_config["writing_task"])
 
    @task
    def editing_task(self) -> Task:
        return Task(config=self.tasks_config["editing_task"])
 
    @crew
    def crew(self) -> Crew:
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            verbose=True,
        )

A few details worth noting:

  • @CrewBase wires the YAML files into Python objects at construction time.
  • @agent and @task decorate factory methods; the order of @task decorators determines execution order in sequential mode.
  • Only the researcher gets web search and only the editor gets file-writing. Keeping tool surfaces narrow is the single biggest reliability win in multi-agent systems.

Step 6: Run the Crew

Edit src/market_crew/main.py to pass the templated inputs.

from market_crew.crew import MarketCrew
 
def run():
    inputs = {
        "topic": "AI coding agents for enterprise developer platforms",
        "industry": "developer tools",
        "target_audience": "Series B engineering leaders",
    }
    MarketCrew().crew().kickoff(inputs=inputs)
 
if __name__ == "__main__":
    run()

Run it:

crewai run

You will see each agent announce its role, search the web, draft, and hand off to the next. The final brief lands in output/brief.md. On gpt-4o-mini the full run costs about ten cents and takes around four minutes.

Step 7: Switch to a Hierarchical Process

Sequential is fine for a fixed pipeline. When you want a manager to decide who works on what, switch to Process.hierarchical. Replace the crew method:

from crewai import Agent
 
@crew
def crew(self) -> Crew:
    manager = Agent(
        role="Editorial Director",
        goal="Decide task order and delegate to the right specialist",
        backstory="A former Bloomberg bureau chief who runs tight newsrooms.",
        allow_delegation=True,
        verbose=True,
    )
    return Crew(
        agents=self.agents,
        tasks=self.tasks,
        process=Process.hierarchical,
        manager_agent=manager,
        verbose=True,
    )

In hierarchical mode the manager reads each task description, picks an agent, evaluates the output, and can loop back. It costs more tokens but handles ambiguous workflows much better than a fixed pipeline.

Step 8: Add Memory

By default each crew run starts fresh. To accumulate knowledge across runs, enable memory.

from crewai import Crew, Process
 
@crew
def crew(self) -> Crew:
    return Crew(
        agents=self.agents,
        tasks=self.tasks,
        process=Process.sequential,
        memory=True,
        embedder={
            "provider": "openai",
            "config": {"model": "text-embedding-3-small"},
        },
        verbose=True,
    )

CrewAI now keeps three memory stores:

  • Short-term: shared scratchpad across the current run
  • Long-term: persistent vector store across runs (SQLite + Chroma on disk by default)
  • Entity: extracted entities and their relationships

Reset memory with crewai reset-memories. For team-scale deployments, swap the embedder to bedrock or vertex and point the long-term store at a managed Postgres+pgvector instance.

Step 9: Stream Agent Thoughts with Callbacks

For UI integration or debugging you usually want every step in real time. CrewAI exposes per-task and per-step callbacks.

from crewai.tasks.task_output import TaskOutput
 
def on_task_complete(output: TaskOutput) -> None:
    print(f"[task:{output.task.description[:60]}] -> {len(output.raw)} chars")
 
def on_step(step: dict) -> None:
    if step.get("tool"):
        print(f"  tool call: {step['tool']} args={step['tool_input']}")
 
@task
def research_task(self) -> Task:
    return Task(
        config=self.tasks_config["research_task"],
        callback=on_task_complete,
    )

For step-level streaming, pass step_callback=on_step when constructing each Agent. Pipe those callbacks into Server-Sent Events and you have a live agent feed in your app.

Step 10: Expose the Crew as an API

CrewAI plays nicely with FastAPI. Create src/market_crew/api.py:

from fastapi import FastAPI
from pydantic import BaseModel
from market_crew.crew import MarketCrew
 
app = FastAPI()
 
class BriefRequest(BaseModel):
    topic: str
    industry: str
    target_audience: str
 
@app.post("/brief")
def generate_brief(req: BriefRequest):
    result = MarketCrew().crew().kickoff(inputs=req.model_dump())
    return {"output": result.raw, "token_usage": result.token_usage}

Run it:

uv run uvicorn market_crew.api:app --reload --port 8000

Hit it with curl:

curl -X POST http://localhost:8000/brief \
  -H "content-type: application/json" \
  -d '{"topic":"AI coding agents","industry":"developer tools","target_audience":"CTOs"}'

For production, run the API behind Gunicorn with a Uvicorn worker class, set a request timeout of around five minutes for crew runs, and offload kickoffs to a Celery or Hatchet queue if you expect concurrent traffic.

Step 11: Add Observability

CrewAI ships first-class telemetry through CrewAI Plus, but you can also use OpenTelemetry directly. The free path is structured logs.

import logging, json, time
 
class CrewLogFormatter(logging.Formatter):
    def format(self, record):
        payload = {
            "ts": int(time.time() * 1000),
            "level": record.levelname,
            "msg": record.getMessage(),
        }
        return json.dumps(payload)
 
handler = logging.StreamHandler()
handler.setFormatter(CrewLogFormatter())
logging.getLogger("crewai").addHandler(handler)
logging.getLogger("crewai").setLevel(logging.INFO)

Pipe stdout to Loki, Datadog, or a self-hosted ClickHouse and you can answer "which agent burned the most tokens last week" without a second project. For span-level traces, install crewai[telemetry] and point OTEL_EXPORTER_OTLP_ENDPOINT at your collector.

Step 12: Test the Crew Deterministically

Multi-agent runs are non-deterministic by default. For CI, swap the LLM for a fake.

from crewai.llm import LLM
from market_crew.crew import MarketCrew
 
class FakeLLM(LLM):
    def __init__(self):
        super().__init__(model="fake")
        self.replies = iter([
            "- finding 1 (https://example.com 2026-01-01)",
            "## SWOT\n- strength: x",
            "Brief draft...",
            "Edited brief.",
        ])
 
    def call(self, messages, **kwargs):
        return next(self.replies)
 
def test_crew_smoke(monkeypatch):
    monkeypatch.setattr("crewai.agent.LLM", FakeLLM)
    out = MarketCrew().crew().kickoff(inputs={
        "topic": "x", "industry": "y", "target_audience": "z",
    })
    assert "Edited brief." in out.raw

Run with uv run pytest. The whole suite finishes in under a second and gives you regression coverage on prompt and tool wiring without burning API credits.

Testing Your Implementation

Verify the end-to-end behavior:

  1. crewai run produces output/brief.md with at least 1100 words
  2. The brief contains at least eight inline citations
  3. crewai run a second time pulls long-term memory into the researcher's prompt (check the verbose log)
  4. The FastAPI endpoint returns a JSON payload with output and token_usage
  5. The pytest fake-LLM test passes in under a second

Troubleshooting

"AuthenticationError: Missing OPENAI_API_KEY" — make sure your shell loaded .env. uv run does this automatically; plain python does not. Add from dotenv import load_dotenv; load_dotenv() at the top of main.py.

Agents hallucinate sources — narrow the researcher's prompt. Add an explicit rule: "If you cannot find a source, write NO_SOURCE instead of inventing one." Then have the editor task fail the run if any NO_SOURCE token appears.

Hierarchical runs loop forever — set max_iter=15 on the manager agent. The default is 25, which is too generous for most workflows.

Memory bleeds across topicscrewai reset-memories --all between unrelated runs, or namespace long-term memory by setting CREWAI_STORAGE_DIR per project.

Tool calls return raw HTML — wrap SerperDevTool with a markdown converter or use ScrapeWebsiteTool from crewai_tools to clean output before the agent sees it.

Next Steps

  • Add a Pydantic AI typed output layer in front of the crew for type-safe brief schemas
  • Wire a LangGraph supervisor on top for branching workflows
  • Schedule the crew on a Hatchet cron and post the brief to Slack every Monday morning
  • Replace the writer with a Claude Agent SDK subprocess for longer reasoning budgets

Conclusion

CrewAI's strength is that it forces you to think about your AI workflow the same way you would think about staffing a team. Each agent has a job, a deliverable, and a narrow toolbox. Once the YAML is right, the Python code stays under one hundred lines and the upgrade path from a single sequential pipeline to a hierarchical, memory-backed, observability-instrumented crew is mostly configuration.

If your problem genuinely has multiple specialists in it, stop trying to make one agent do everything. Hire a crew.