smolagents: Build AI Agents That Write Code

Almost every AI agent framework you have read about works the same way: the model emits a JSON blob describing which tool to call and with what arguments, a runtime parses it, runs the function, and feeds the result back. LangGraph, CrewAI, the OpenAI Agents SDK, Pydantic AI — all of them speak JSON tool calls.

Hugging Face's smolagents takes a different bet. Instead of asking the model to fill in a JSON form, it asks the model to write a snippet of Python code, then executes that code in a sandbox. This pattern is called CodeAct, and once you see it work, the JSON approach starts to feel like a workaround.

This guide explains why code-writing agents matter, how smolagents implements them, and how to run the whole thing on your own infrastructure — a real concern for teams in regulated MENA markets.

Why agents that write code beat JSON tool calls

Consider a task: "Find the three most-downloaded models for text classification, then average their download counts."

A JSON tool-calling agent has to do this in several round-trips. Call the search tool. Get results back. Call the search tool again. Get results back. Then somehow add and divide — except JSON tool calls cannot do arithmetic, so it needs yet another tool or another model turn.

A code agent writes one block:

models = search_models("text-classification", limit=3)
counts = [m.downloads for m in models]
average = sum(counts) / len(counts)
final_answer(average)

That is the whole point. Code is a universal composition layer. Loops, conditionals, variables, function nesting, and arithmetic are all free — they are already in the language. A model that expresses its actions as code can combine tools the way a programmer would, instead of being forced through one rigid JSON call per step.

Hugging Face reports that this leads to noticeably fewer steps and fewer LLM calls for multi-step tasks, because the model batches related work into a single executable action. Fewer round-trips means lower latency and lower token cost.

Installing smolagents

The core library is famously small — roughly a thousand lines of Python you can read in an afternoon. That makes it auditable and hackable, which matters when an agent is executing code on your behalf.

pip install smolagents
# For the default toolbox (web search, etc.)
pip install 'smolagents[toolkit]'

Your first CodeAgent

A CodeAgent needs two things: a model and a list of tools. Here is a minimal agent backed by a model from the Hugging Face Inference API.

from smolagents import CodeAgent, InferenceClientModel
 
model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
 
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
 
agent.run("Could you give me the 118th number in the Fibonacci sequence?")

add_base_tools=True loads a small set of built-in tools. The agent reasons, writes Python, the runtime executes it, and the loop continues until the model calls final_answer().

smolagents ships two agent classes, both inheriting from MultiStepAgent:

CodeAgent — the default; writes its actions as Python code.
ToolCallingAgent — writes its actions as JSON, for cases where you specifically want the classic pattern or your model is weak at code.

Defining your own tools

A tool is just a function the agent is allowed to call from inside its code. The simplest way to make one is the @tool decorator. Give it a clear name, type hints, and a docstring with an Args: section — the model reads all of that to decide when and how to use the tool.

from smolagents import tool
 
@tool
def get_exchange_rate(base: str, quote: str) -> float:
    """
    Returns the current exchange rate between two currencies.
 
    Args:
        base: The base currency code, e.g. "USD".
        quote: The quote currency code, e.g. "TND".
    """
    rates = fetch_rates(base)
    return rates[quote]

When you need more control — heavy attributes loaded once, external clients held open — subclass Tool instead:

from smolagents import Tool
 
class ExchangeRateTool(Tool):
    name = "get_exchange_rate"
    description = "Returns the current exchange rate between two currencies."
    inputs = {
        "base": {"type": "string", "description": "Base currency code, e.g. USD."},
        "quote": {"type": "string", "description": "Quote currency code, e.g. TND."},
    }
    output_type = "number"
 
    def forward(self, base: str, quote: str) -> float:
        return fetch_rates(base)[quote]

The built-in WebSearchTool is a good example of a ready-made tool you can drop straight into the tools list.

from smolagents import CodeAgent, WebSearchTool
 
agent = CodeAgent(tools=[WebSearchTool()], model=model)
agent.run("What changed in the latest Python release?")

Sandboxing: the rule you cannot skip

Here is the uncomfortable truth about code agents: a model writing and running Python on your machine is a security risk if that code runs unsandboxed. An agent that can import os and touch the filesystem is one bad generation away from damage.

smolagents handles this with the executor_type argument. It supports Docker, E2B, Modal, and Blaxel out of the box, and the agent works as a context manager so the sandbox is cleaned up automatically.

from smolagents import CodeAgent, InferenceClientModel
 
with CodeAgent(model=InferenceClientModel(), tools=[], executor_type="docker") as agent:
    agent.run("Can you give me the 100th Fibonacci number?")

Swap "docker" for "e2b", "modal", or "blaxel" depending on whether you want local isolation or a managed cloud sandbox. For local execution, you also control which imports are permitted through additional_authorized_imports, so the agent can only reach the libraries you explicitly trust:

agent = CodeAgent(
    tools=[WebSearchTool()],
    model=model,
    additional_authorized_imports=["pandas", "numpy"],
    max_steps=10,
)

Treat sandboxing as mandatory, not optional. The Docker executor is the simplest path to running code agents on your own servers without exposing the host.

Run it on your own infrastructure

This is where smolagents becomes interesting for MENA teams working under INPDP and PDPL data-residency rules. The framework is model-agnostic, so nothing forces you to send prompts to a US cloud.

Through LiteLLMModel you can point the agent at a local Ollama server:

from smolagents import CodeAgent, LiteLLMModel
 
model = LiteLLMModel(
    model_id="ollama_chat/llama3.2",
    api_base="http://localhost:11434",
    num_ctx=8192,
)
 
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
agent.run("Could you give me the 118th number in the Fibonacci sequence?")

Or target any OpenAI-compatible endpoint — a self-hosted vLLM server, for example — with OpenAIModel:

from smolagents import OpenAIModel
 
model = OpenAIModel(
    model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
    api_base="http://your-vllm-host:8000/v1",
    api_key="not-needed-for-local",
)

A code agent running open weights on a vLLM box, executing inside a Docker sandbox on the same network, never sends customer data off-premises. That is a clean self-hosting story for sovereignty-conscious deployments.

Multi-agent teams

For larger workloads you can build a hierarchy: a manager agent that delegates to specialised worker agents. Any agent you want to be managed must have a name and a description so the manager knows what it does and when to call it.

from smolagents import CodeAgent, ToolCallingAgent, WebSearchTool, InferenceClientModel
 
model = InferenceClientModel()
 
web_agent = ToolCallingAgent(
    tools=[WebSearchTool()],
    model=model,
    name="web_search_agent",
    description="Searches the web and returns relevant findings for a query.",
)
 
manager = CodeAgent(
    tools=[],
    model=model,
    managed_agents=[web_agent],
)
 
manager.run("Research recent open-source agent frameworks and summarise the trade-offs.")

The manager calls the web agent exactly as if it were a function, passing a task description. This keeps web-search context out of the manager's own reasoning and lets each agent specialise.

When to reach for smolagents

smolagents is the right tool when your tasks are genuinely computational — data wrangling, multi-step research, anything where the model benefits from composing operations rather than firing single tool calls. Its tiny codebase and model-agnostic design make it ideal for self-hosted, cost-sensitive, or compliance-bound deployments.

It is less suited to rigid, audited workflows where you want every action constrained to a fixed schema — there a JSON tool-calling framework or a ToolCallingAgent gives you tighter control.

The broader shift is the one worth internalising: the most capable agents are starting to treat code as the action space itself. JSON tool calls were a bridge. Letting models write and run real code, safely sandboxed, is where a large part of the agent ecosystem is heading — and smolagents is the cleanest place to start learning the pattern today.

If you are weighing this against JSON-first stacks, our comparison of LangGraph, CrewAI, and the OpenAI Agents SDK is a useful companion, as is our guide to self-hosting LLMs with Ollama.