Single-agent loops are great until your problem actually has multiple specialists in it. Research, writing, code review, financial analysis — these are team sports, not solo acts. CrewAI takes that intuition and turns it into a programmable abstraction: you define a crew of role-based agents, hand each one a goal and a backstory, give them tools, and let them collaborate sequentially or hierarchically toward a shared objective.
Unlike graph-based frameworks like LangGraph or single-agent runners like the OpenAI Agents SDK, CrewAI treats orchestration as an organizational chart. In this tutorial you will install CrewAI from scratch, build a four-agent market research crew, give it browsing and file tools, switch between sequential and hierarchical processes, add long-term memory, and instrument the whole thing for production use.
Prerequisites
Before starting, make sure you have:
- Python 3.10 or newer installed
- An OpenAI or Anthropic API key (or a local Ollama install)
- A free Serper.dev key for web search (optional but recommended)
- Basic familiarity with Python classes, decorators, and
async/await - A terminal and a code editor (VS Code with the Python extension recommended)
What You Will Build
By the end of this tutorial, you will have:
- A CrewAI project scaffolded with
uvand the official CLI - A four-agent crew — Researcher, Analyst, Writer, Editor — that produces a polished market brief
- Web search and file-writing tools wired into the agents
- A sequential pipeline and a hierarchical manager variant of the same crew
- Short-term and long-term memory enabled with embeddings
- Custom callbacks that stream agent thoughts and tool calls
- A FastAPI endpoint that triggers the crew on demand
- Observability through CrewAI Plus telemetry and structured logging
Step 1: Install CrewAI
CrewAI ships as a single package with optional extras for tools, embeddings, and deployment helpers. Create an isolated environment first — uv is the fastest option in 2026.
mkdir market-crew && cd market-crew
uv venv --python 3.11
source .venv/bin/activate
uv pip install "crewai[tools]" python-dotenv fastapi uvicornIf you prefer plain pip:
python -m venv .venv
source .venv/bin/activate
pip install "crewai[tools]" python-dotenv fastapi uvicornVerify the install:
crewai version
# CrewAI version: 0.86.xStep 2: Scaffold the Project
CrewAI ships a CLI that mirrors the layout the framework expects. Run the generator from the parent directory.
cd .. && crewai create crew market-crewWhen prompted, pick openai as the provider and gpt-4o-mini as the model. The CLI lays down this structure:
market-crew/
├── src/
│ └── market_crew/
│ ├── config/
│ │ ├── agents.yaml
│ │ └── tasks.yaml
│ ├── crew.py
│ ├── main.py
│ └── tools/
├── .env
├── pyproject.toml
└── README.md
The two YAML files are where most of your work will live. agents.yaml defines roles and goals; tasks.yaml defines what each agent has to produce.
Add your secrets to .env:
OPENAI_API_KEY=sk-...
SERPER_API_KEY=... # from serper.dev, optional
MODEL=gpt-4o-miniStep 3: Define the Agents
Open src/market_crew/config/agents.yaml and replace the placeholder content with four specialist roles.
researcher:
role: >
Senior Market Researcher specializing in {industry}
goal: >
Gather the most recent, credible facts, statistics, and quotes about
{topic} from public web sources published in the last 12 months.
backstory: >
You spent a decade at McKinsey running primary and secondary research.
You triangulate every claim across at least two independent sources
and you tag everything with its publication date and URL.
verbose: true
allow_delegation: false
analyst:
role: >
Strategic Analyst for {industry}
goal: >
Turn the researcher's raw notes into a SWOT, a competitive matrix,
and three actionable recommendations for {target_audience}.
backstory: >
You write the kind of two-page memos that get read by CEOs. You
refuse to include any claim that is not backed by a cited source
from the researcher's notes.
verbose: true
writer:
role: >
Senior Editorial Writer
goal: >
Produce a 1200-word executive brief on {topic} that an investor
could read in under six minutes and forward to their partners.
backstory: >
You used to write the Stratechery weekly. You favor short sentences,
concrete examples, and exactly one strong thesis per piece.
verbose: true
editor:
role: >
Chief Editor
goal: >
Fact-check the brief against the researcher's source list, fix
weak prose, and ensure every claim cites its source.
backstory: >
You ran the standards desk at The Economist for eight years. You
do not let a single uncited number pass.
verbose: trueCrewAI interpolates the {industry}, {topic}, and {target_audience} placeholders at runtime from the inputs you pass to the crew.
Step 4: Define the Tasks
Edit src/market_crew/config/tasks.yaml:
research_task:
description: >
Collect at least 12 high-quality findings about {topic} in the
{industry} sector. For each finding, capture: the claim, the
source URL, the publication date, and a one-sentence summary.
expected_output: >
A markdown list of 12 or more bulleted findings. Each bullet
must include the source URL and date. No paraphrasing without
a citation.
agent: researcher
analysis_task:
description: >
Using the researcher's findings, produce a SWOT analysis for a
{target_audience} entering the {topic} market, a 5-row competitive
matrix, and exactly three recommendations ranked by impact.
expected_output: >
A markdown document with three sections: SWOT, Competitive Matrix,
Recommendations. Every numerical claim must cite a finding by URL.
agent: analyst
context:
- research_task
writing_task:
description: >
Write a 1200-word executive brief on {topic} aimed at a
{target_audience}. Open with a single-sentence thesis. Use the
analyst's SWOT and recommendations. End with a "What to watch"
section listing three forward indicators.
expected_output: >
A polished markdown brief, 1100 to 1300 words, with H2 section
headings and inline citations as markdown links.
agent: writer
context:
- research_task
- analysis_task
editing_task:
description: >
Edit the brief for clarity and accuracy. Every numeric claim must
cite a source from the researcher's list. Flag any unsupported
claim by replacing it with a TODO comment.
expected_output: >
The final markdown brief, saved to `output/brief.md`.
agent: editor
context:
- research_task
- analysis_task
- writing_task
output_file: output/brief.mdThe context field is where CrewAI's collaboration model lives. Each task receives the outputs of its dependencies as input — no manual prompt stitching required.
Step 5: Wire the Crew
Open src/market_crew/crew.py. The CLI generated a skeleton; replace it with the version below.
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import SerperDevTool, FileWriterTool
@CrewBase
class MarketCrew:
"""Market research crew with four specialists."""
agents_config = "config/agents.yaml"
tasks_config = "config/tasks.yaml"
@agent
def researcher(self) -> Agent:
return Agent(
config=self.agents_config["researcher"],
tools=[SerperDevTool()],
)
@agent
def analyst(self) -> Agent:
return Agent(config=self.agents_config["analyst"])
@agent
def writer(self) -> Agent:
return Agent(config=self.agents_config["writer"])
@agent
def editor(self) -> Agent:
return Agent(
config=self.agents_config["editor"],
tools=[FileWriterTool()],
)
@task
def research_task(self) -> Task:
return Task(config=self.tasks_config["research_task"])
@task
def analysis_task(self) -> Task:
return Task(config=self.tasks_config["analysis_task"])
@task
def writing_task(self) -> Task:
return Task(config=self.tasks_config["writing_task"])
@task
def editing_task(self) -> Task:
return Task(config=self.tasks_config["editing_task"])
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential,
verbose=True,
)A few details worth noting:
@CrewBasewires the YAML files into Python objects at construction time.@agentand@taskdecorate factory methods; the order of@taskdecorators determines execution order in sequential mode.- Only the researcher gets web search and only the editor gets file-writing. Keeping tool surfaces narrow is the single biggest reliability win in multi-agent systems.
Step 6: Run the Crew
Edit src/market_crew/main.py to pass the templated inputs.
from market_crew.crew import MarketCrew
def run():
inputs = {
"topic": "AI coding agents for enterprise developer platforms",
"industry": "developer tools",
"target_audience": "Series B engineering leaders",
}
MarketCrew().crew().kickoff(inputs=inputs)
if __name__ == "__main__":
run()Run it:
crewai runYou will see each agent announce its role, search the web, draft, and hand off to the next. The final brief lands in output/brief.md. On gpt-4o-mini the full run costs about ten cents and takes around four minutes.
Step 7: Switch to a Hierarchical Process
Sequential is fine for a fixed pipeline. When you want a manager to decide who works on what, switch to Process.hierarchical. Replace the crew method:
from crewai import Agent
@crew
def crew(self) -> Crew:
manager = Agent(
role="Editorial Director",
goal="Decide task order and delegate to the right specialist",
backstory="A former Bloomberg bureau chief who runs tight newsrooms.",
allow_delegation=True,
verbose=True,
)
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.hierarchical,
manager_agent=manager,
verbose=True,
)In hierarchical mode the manager reads each task description, picks an agent, evaluates the output, and can loop back. It costs more tokens but handles ambiguous workflows much better than a fixed pipeline.
Step 8: Add Memory
By default each crew run starts fresh. To accumulate knowledge across runs, enable memory.
from crewai import Crew, Process
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential,
memory=True,
embedder={
"provider": "openai",
"config": {"model": "text-embedding-3-small"},
},
verbose=True,
)CrewAI now keeps three memory stores:
- Short-term: shared scratchpad across the current run
- Long-term: persistent vector store across runs (SQLite + Chroma on disk by default)
- Entity: extracted entities and their relationships
Reset memory with crewai reset-memories. For team-scale deployments, swap the embedder to bedrock or vertex and point the long-term store at a managed Postgres+pgvector instance.
Step 9: Stream Agent Thoughts with Callbacks
For UI integration or debugging you usually want every step in real time. CrewAI exposes per-task and per-step callbacks.
from crewai.tasks.task_output import TaskOutput
def on_task_complete(output: TaskOutput) -> None:
print(f"[task:{output.task.description[:60]}] -> {len(output.raw)} chars")
def on_step(step: dict) -> None:
if step.get("tool"):
print(f" tool call: {step['tool']} args={step['tool_input']}")
@task
def research_task(self) -> Task:
return Task(
config=self.tasks_config["research_task"],
callback=on_task_complete,
)For step-level streaming, pass step_callback=on_step when constructing each Agent. Pipe those callbacks into Server-Sent Events and you have a live agent feed in your app.
Step 10: Expose the Crew as an API
CrewAI plays nicely with FastAPI. Create src/market_crew/api.py:
from fastapi import FastAPI
from pydantic import BaseModel
from market_crew.crew import MarketCrew
app = FastAPI()
class BriefRequest(BaseModel):
topic: str
industry: str
target_audience: str
@app.post("/brief")
def generate_brief(req: BriefRequest):
result = MarketCrew().crew().kickoff(inputs=req.model_dump())
return {"output": result.raw, "token_usage": result.token_usage}Run it:
uv run uvicorn market_crew.api:app --reload --port 8000Hit it with curl:
curl -X POST http://localhost:8000/brief \
-H "content-type: application/json" \
-d '{"topic":"AI coding agents","industry":"developer tools","target_audience":"CTOs"}'For production, run the API behind Gunicorn with a Uvicorn worker class, set a request timeout of around five minutes for crew runs, and offload kickoffs to a Celery or Hatchet queue if you expect concurrent traffic.
Step 11: Add Observability
CrewAI ships first-class telemetry through CrewAI Plus, but you can also use OpenTelemetry directly. The free path is structured logs.
import logging, json, time
class CrewLogFormatter(logging.Formatter):
def format(self, record):
payload = {
"ts": int(time.time() * 1000),
"level": record.levelname,
"msg": record.getMessage(),
}
return json.dumps(payload)
handler = logging.StreamHandler()
handler.setFormatter(CrewLogFormatter())
logging.getLogger("crewai").addHandler(handler)
logging.getLogger("crewai").setLevel(logging.INFO)Pipe stdout to Loki, Datadog, or a self-hosted ClickHouse and you can answer "which agent burned the most tokens last week" without a second project. For span-level traces, install crewai[telemetry] and point OTEL_EXPORTER_OTLP_ENDPOINT at your collector.
Step 12: Test the Crew Deterministically
Multi-agent runs are non-deterministic by default. For CI, swap the LLM for a fake.
from crewai.llm import LLM
from market_crew.crew import MarketCrew
class FakeLLM(LLM):
def __init__(self):
super().__init__(model="fake")
self.replies = iter([
"- finding 1 (https://example.com 2026-01-01)",
"## SWOT\n- strength: x",
"Brief draft...",
"Edited brief.",
])
def call(self, messages, **kwargs):
return next(self.replies)
def test_crew_smoke(monkeypatch):
monkeypatch.setattr("crewai.agent.LLM", FakeLLM)
out = MarketCrew().crew().kickoff(inputs={
"topic": "x", "industry": "y", "target_audience": "z",
})
assert "Edited brief." in out.rawRun with uv run pytest. The whole suite finishes in under a second and gives you regression coverage on prompt and tool wiring without burning API credits.
Testing Your Implementation
Verify the end-to-end behavior:
crewai runproducesoutput/brief.mdwith at least 1100 words- The brief contains at least eight inline citations
crewai runa second time pulls long-term memory into the researcher's prompt (check the verbose log)- The FastAPI endpoint returns a JSON payload with
outputandtoken_usage - The pytest fake-LLM test passes in under a second
Troubleshooting
"AuthenticationError: Missing OPENAI_API_KEY" — make sure your shell loaded .env. uv run does this automatically; plain python does not. Add from dotenv import load_dotenv; load_dotenv() at the top of main.py.
Agents hallucinate sources — narrow the researcher's prompt. Add an explicit rule: "If you cannot find a source, write NO_SOURCE instead of inventing one." Then have the editor task fail the run if any NO_SOURCE token appears.
Hierarchical runs loop forever — set max_iter=15 on the manager agent. The default is 25, which is too generous for most workflows.
Memory bleeds across topics — crewai reset-memories --all between unrelated runs, or namespace long-term memory by setting CREWAI_STORAGE_DIR per project.
Tool calls return raw HTML — wrap SerperDevTool with a markdown converter or use ScrapeWebsiteTool from crewai_tools to clean output before the agent sees it.
Next Steps
- Add a Pydantic AI typed output layer in front of the crew for type-safe brief schemas
- Wire a LangGraph supervisor on top for branching workflows
- Schedule the crew on a Hatchet cron and post the brief to Slack every Monday morning
- Replace the writer with a Claude Agent SDK subprocess for longer reasoning budgets
Conclusion
CrewAI's strength is that it forces you to think about your AI workflow the same way you would think about staffing a team. Each agent has a job, a deliverable, and a narrow toolbox. Once the YAML is right, the Python code stays under one hundred lines and the upgrade path from a single sequential pipeline to a hierarchical, memory-backed, observability-instrumented crew is mostly configuration.
If your problem genuinely has multiple specialists in it, stop trying to make one agent do everything. Hire a crew.