Is KoboToolbox free for NGOs?

Yes. The community server kf.kobotoolbox.org is free with a 10,000 submission cap per user. Humanitarian organisations registered with OCHA can apply for a free humanitarian account with much higher limits. Most field teams in Tunisia, Morocco, Lebanon and Jordan that we audit are already on the humanitarian plan — check with your country office before standing up a new account.

Why not just use Power BI?

Power BI Pro is $10 per user per month. For a 12-person M&E + program team across three countries, that is $1,440 per year just for licences — before consulting fees. More importantly, Power BI does not write donor narratives in three languages, and it locks your data into Microsoft's analytics warehouse. The pipeline in this tutorial runs on a $6/month VPS and keeps your data in plain Postgres.

What about Kobo rate limits?

KoboToolbox enforces roughly 60 requests per minute on the v2 API. For nightly ETL on a programme of under 50,000 submissions this is a non-issue — page through the data with limit=30000 and you will pull a full snapshot in two or three requests. Stagger your cron jobs if you run multiple instances.

How does the AI layer handle Arabic and French voice notes from the field?

We attach voice notes to KoboToolbox submissions via the audio question type, pull the attachments through the Kobo API, transcribe with Whisper (multilingual), then ask a single LLM call to summarise the transcript in English for the donor report while keeping the original in the audit log. The cost is roughly $0.006 per minute of audio transcribed plus a few cents per summary.

Can the dashboard generate USAID, EU and AFD templates automatically?

Yes, but with a human in the loop. The pipeline maps your KoboToolbox indicators into the donor's logframe cells using a YAML mapping file you maintain per donor. The PDF is generated automatically via WeasyPrint. The M&E officer reviews and signs off before submission — agents never auto-submit to the donor portal.

Is this production-ready or a prototype?

The stack described here is what we ship for live NGO programmes across Tunisia, Morocco and Jordan. The code is intentionally boring — Python, Postgres, cron — because donor programmes outlive any given JavaScript framework. The AI layer is the only moving target, and it is isolated behind one interface so you can swap providers without touching the ETL.

Build an AI-Powered KoboToolbox Dashboard for NGOs: Kobo API → Postgres → Donor PDFs in 3 Languages

Stop emailing CSVs at midnight. This tutorial walks through a complete KoboToolbox dashboard pipeline for NGOs: Python ETL, Postgres warehouse, an AI layer for anomaly detection and multilingual transcript summarisation, and donor PDFs generated in English, French and Arabic from the same dataset.

The KoboToolbox Export Problem

If you are an M&E officer at a Tunisian, Moroccan or Lebanese NGO running a donor-funded programme, your week probably looks like this:

Monday morning, you download a CSV export from KoboToolbox.
The field team in Sidi Bouzid uploads thirty more submissions Monday afternoon. Your CSV is already stale.
The USAID reporting deadline is Friday. The donor wants a PDF in English, the steering committee wants the same numbers in French, and the local partner needs them in Arabic.
You spend Friday night in Excel reconciling three exports, then in PowerPoint copy-pasting numbers into a donor template.

This is not a tooling problem — KoboToolbox is excellent at what it does. It is a pipeline problem. Kobo is a form runtime, not a reporting platform. The fix is to treat each Kobo submission as a fact in a proper data warehouse, then layer reporting and AI on top.

This tutorial is the practical companion to our pillar piece on AI M&E dashboards for MENA NGOs. Read that for the architectural reasoning; come here when you want to actually build it.

What You Will Build

By the end of this tutorial, you will have:

A Python ETL job that pulls KoboToolbox submissions through the v2 REST API on a nightly schedule
A Postgres warehouse with three tables: raw submissions, denormalised indicators, and an audit log
An AI agent layer that flags anomalies and summarises Arabic/French voice notes into English for donor reports
A PDF reporting engine using WeasyPrint that produces the same dataset in EN, FR and AR from a single Jinja2 template
A total operating cost under $30/month for a programme of up to 50,000 submissions

Everything is open source. The only paid component is the LLM API calls (a few dollars a month at NGO volumes).

Prerequisites

Before you start:

Python 3.11+ and pip
A KoboToolbox account with at least one deployed form. The community server (kf.kobotoolbox.org) is fine.
A Postgres 15+ database. For testing, a local Docker container is enough. For production, a $6/month VPS instance from Hetzner or DigitalOcean works.
An API key from any OpenAI-compatible LLM provider (we use Claude via Anthropic, but the code is provider-agnostic)
Familiarity with requests, pandas, and basic SQL

Pipeline Architecture

Here is the end-to-end pipeline you will build. Each box is independently replaceable.

flowchart LR
  A[Field Teams<br/>Sidi Bouzid, Tataouine,<br/>Mafraq, Casablanca] -->|XLSForm offline| B[KoboToolbox<br/>Server]
  B -->|v2 REST API| C[Python ETL<br/>nightly cron]
  C -->|append-only| D[(Postgres<br/>warehouse)]
  D --> E[AI Agent Layer<br/>Anomaly + Summary]
  E -->|writes flags| D
  D --> F[WeasyPrint<br/>PDF Renderer]
  F --> G[Donor PDF<br/>EN/FR/AR]
  D --> H[Web Dashboard<br/>internal team]

The two design rules that matter:

The warehouse is the source of truth. Every downstream consumer (PDFs, dashboards, agents) reads from Postgres. Nothing reads from Kobo directly except the ETL.
AI writes flags, never overwrites data. Agents can mark a submission as suspicious or attach a summary. They cannot mutate the underlying indicator values.

Step 1: KoboToolbox API Basics

KoboToolbox exposes a REST API at https://kf.kobotoolbox.org/api/v2/. You authenticate with a long-lived token, which you generate from your account settings page.

Three endpoints matter for our pipeline:

GET /api/v2/assets/{asset_uid}/ — form metadata, schema, deployment URL
GET /api/v2/assets/{asset_uid}/data/ — paginated submission data as JSON
GET /api/v2/assets/{asset_uid}/data/{submission_id}/attachments/{attachment_id}/ — file attachments (images, audio)

The asset_uid is the short ID in your form URL — looks like aJk8Zq3wY7nF2cQv9pXt. Rate limit is roughly 60 requests/minute. Submissions are paginated with ?start=N&limit=M query parameters.

Get your token (do this once, store it as an environment variable):

curl -X POST https://kf.kobotoolbox.org/token/ \
  -d "username=YOUR_USER&password=YOUR_PASS"
# returns: {"token":"a1b2c3..."}
 
export KOBO_TOKEN="a1b2c3..."
export KOBO_ASSET="aJk8Zq3wY7nF2cQv9pXt"

Step 2: Python ETL — Pulling Submissions

Here is the core ETL. It is intentionally boring: pull JSON, upsert into Postgres, log what you did. Save as etl_kobo.py.

import os
import json
import logging
from datetime import datetime, timezone
import requests
import pandas as pd
from sqlalchemy import create_engine, text
 
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
log = logging.getLogger(__name__)
 
KOBO_BASE = "https://kf.kobotoolbox.org/api/v2"
KOBO_TOKEN = os.environ["KOBO_TOKEN"]
KOBO_ASSET = os.environ["KOBO_ASSET"]
DB_URL = os.environ["DATABASE_URL"]  # postgresql://user:pass@host/db
 
HEADERS = {"Authorization": f"Token {KOBO_TOKEN}"}
engine = create_engine(DB_URL, pool_pre_ping=True)
 
 
def fetch_submissions(asset_uid: str, page_size: int = 30000) -> list[dict]:
    """Pull all submissions for a deployed form, paginated."""
    url = f"{KOBO_BASE}/assets/{asset_uid}/data/"
    out, start = [], 0
    while True:
        r = requests.get(
            url,
            headers=HEADERS,
            params={"start": start, "limit": page_size, "format": "json"},
            timeout=60,
        )
        r.raise_for_status()
        batch = r.json().get("results", [])
        if not batch:
            break
        out.extend(batch)
        log.info("fetched %d (total %d)", len(batch), len(out))
        if len(batch) < page_size:
            break
        start += page_size
    return out
 
 
def upsert_raw(submissions: list[dict]) -> int:
    """Append raw payloads to submissions table. Idempotent on _uuid."""
    if not submissions:
        return 0
    rows = [
        {
            "uuid": s["_uuid"],
            "submitted_at": s["_submission_time"],
            "asset_uid": KOBO_ASSET,
            "payload": json.dumps(s, ensure_ascii=False),
            "ingested_at": datetime.now(timezone.utc).isoformat(),
        }
        for s in submissions
    ]
    with engine.begin() as conn:
        conn.execute(
            text("""
                INSERT INTO submissions (uuid, submitted_at, asset_uid, payload, ingested_at)
                VALUES (:uuid, :submitted_at, :asset_uid, :payload, :ingested_at)
                ON CONFLICT (uuid) DO NOTHING
            """),
            rows,
        )
    return len(rows)
 
 
if __name__ == "__main__":
    subs = fetch_submissions(KOBO_ASSET)
    n = upsert_raw(subs)
    log.info("ingested %d submissions (duplicates skipped)", n)

The corresponding schema migration:

CREATE TABLE IF NOT EXISTS submissions (
    uuid          TEXT PRIMARY KEY,
    submitted_at  TIMESTAMPTZ NOT NULL,
    asset_uid     TEXT NOT NULL,
    payload       JSONB NOT NULL,
    ingested_at   TIMESTAMPTZ NOT NULL
);
CREATE INDEX IF NOT EXISTS submissions_submitted_at_idx ON submissions(submitted_at);
CREATE INDEX IF NOT EXISTS submissions_asset_uid_idx ON submissions(asset_uid);

Run it nightly with cron:

30 2 * * * cd /opt/ngo-etl && /usr/bin/python3 etl_kobo.py >> /var/log/kobo-etl.log 2>&1

That is the whole ingestion layer. Fifty lines, no Zapier subscription, fully reproducible.

Step 3: Denormalise into Indicators

The raw payload JSONB is your audit trail. For dashboards and donor PDFs you want a flat indicator table. Define indicators in a YAML file (one per donor template) and run a denormalisation pass after ingestion.

# indicators.py — runs after etl_kobo.py
from sqlalchemy import text
import pandas as pd
 
INDICATOR_SQL = """
INSERT INTO indicators (uuid, region, beneficiary_count, indicator_code, value, captured_at)
SELECT
    s.uuid,
    s.payload->>'region' AS region,
    (s.payload->>'num_beneficiaries')::int AS beneficiary_count,
    'OUT-1.2-cash-distribution' AS indicator_code,
    (s.payload->>'amount_tnd')::numeric AS value,
    s.submitted_at
FROM submissions s
WHERE s.payload->>'form_type' = 'cash_distribution'
ON CONFLICT (uuid, indicator_code) DO UPDATE
   SET value = EXCLUDED.value, captured_at = EXCLUDED.captured_at;
"""
 
with engine.begin() as conn:
    conn.execute(text(INDICATOR_SQL))

Why store indicators denormalised? Because donors ask "show me the calculation" in audits, and a single SQL view that everyone can read is a better answer than a Jupyter notebook nobody can find.

Step 4: The AI Layer — Anomalies + Voice Note Summaries

Two agents earn their keep at NGO scale: an anomaly flagger that runs after each ETL and a transcript summariser that turns Arabic/French voice notes into English bullet points for donor narratives.

# agents.py
import os
import anthropic
from sqlalchemy import text
 
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
 
ANOMALY_PROMPT = """You are an M&E data quality assistant for a humanitarian NGO.
Below are this week's indicator values vs the rolling 8-week baseline.
Flag any value that is more than 2 standard deviations from the baseline.
Return a JSON list of {{indicator, region, value, baseline_mean, severity}}.
 
Data:
{rows}
"""
 
def flag_anomalies():
    with engine.begin() as conn:
        df = pd.read_sql(
            "SELECT region, indicator_code, value, captured_at "
            "FROM indicators WHERE captured_at > now() - interval '8 weeks'",
            conn,
        )
    msg = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": ANOMALY_PROMPT.format(rows=df.to_csv(index=False))}],
    )
    flags = msg.content[0].text
    with engine.begin() as conn:
        conn.execute(text("INSERT INTO agent_flags(kind, payload, created_at) "
                          "VALUES ('anomaly', :p, now())"), {"p": flags})
 
 
def summarise_voice_note(audio_path: str, lang_hint: str = "ar") -> str:
    """Whisper transcription + LLM summary in English."""
    with open(audio_path, "rb") as f:
        transcript = client.audio.transcribe(file=f, language=lang_hint).text  # pseudo
    summary = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"Summarise this field interview in 3 English bullets for a donor report:\n\n{transcript}",
        }],
    ).content[0].text
    return summary

The agent never writes to indicators. It writes to agent_flags, a separate table that the dashboard renders as a sidebar. Humans decide what to do with the flags.

For multilingual transcript summarisation, this is where the magic happens for francophone Maghreb NGOs. A field team in Tataouine records a 90-second voice note in Tunisian darija explaining why a distribution was delayed. The pipeline transcribes it (Whisper handles Arabic and French well), then asks the LLM to produce three English bullets the M&E officer can paste into the USAID narrative. End-to-end cost: about $0.02 per voice note.

For more on building agent layers like this, see our companion piece on GitLab AI PM/QA dashboards — same pattern, different domain.

Step 5: Donor PDFs in EN / FR / AR

The donor does not log in to your dashboard. The donor wants a PDF in their template, in their language, on the deadline. WeasyPrint turns HTML + CSS into print-quality PDFs and handles Arabic RTL natively.

# render_pdf.py
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
import pandas as pd
 
env = Environment(loader=FileSystemLoader("templates"))
 
def render_report(lang: str, period: str, output: str):
    tmpl = env.get_template(f"donor_report_{lang}.html.j2")
    with engine.begin() as conn:
        df = pd.read_sql(
            text("SELECT indicator_code, region, sum(value) AS total "
                 "FROM indicators WHERE captured_at >= :start "
                 "GROUP BY indicator_code, region"),
            conn, params={"start": period},
        )
    html = tmpl.render(rows=df.to_dict(orient="records"), period=period, lang=lang)
    HTML(string=html).write_pdf(output)
 
# One template, three renders:
for lang in ("en", "fr", "ar"):
    render_report(lang, "2026-04-01", f"reports/Q2-2026-{lang}.pdf")

The Jinja2 template uses dir="rtl" when lang == "ar", a different font stack (Noto Naskh Arabic) for the AR version, and the donor's actual logframe table structure. Build one template per donor, reuse it forever.

Cost Comparison

For a typical INGO programme with one M&E officer, three field staff, two desk officers and two donor reps, here is the monthly bill:

Component	Open-source pipeline	Power BI Pro stack
VPS (Hetzner CX22)	$6	—
Postgres (on the VPS)	$0	—
LLM API (Claude/OpenAI)	~$15	—
WeasyPrint / Python	$0	—
Power BI Pro × 8 users	—	$80
Power BI Premium per user	—	from $200
Consultant retainer	—	from $1,500
Monthly total	~$30	$280 – $1,800+

The open-source pipeline also wins on portability. When the donor programme ends in eighteen months and a new donor wants the same data on a different schedule, your Postgres database moves with you. Your Power BI semantic model does not.

Production Checklist

Before you point this at a real donor programme:

Service account for the Kobo token, not a personal account
Backups of Postgres to S3-compatible storage (Hetzner Storage Box is $4/month for 1 TB)
Cron job heartbeat — we use healthchecks.io free tier
Row-level security in Postgres so the dashboard role cannot read PII columns
A pyproject.toml pinning every dependency (requests, pandas, sqlalchemy, weasyprint, anthropic)
A dry-run flag on agents.py so you can review LLM outputs before they hit the database

Where to Go Next

Once the pipeline is live, the obvious extensions are:

An MCP server on top of Postgres so the country director can ask "how many beneficiaries did we reach in Tataouine last week?" from Claude Desktop in Arabic
A web dashboard for the M&E team (we ship a Next.js + Tremor setup; SvelteKit + ECharts works equally well)
Programmatic donor template ingestion — map a new donor's logframe into your indicator codes via YAML

Want Us to Build This for You?

If you are running a donor-funded programme and the pipeline above is what you wish you had, tell us about your programme. We typically deliver a working KoboToolbox dashboard with multilingual donor PDFs in two weeks, from kickoff to live data. From around $4,500 for a single-donor programme; less if you already have your XLSForms stable.

Read the strategic context in our pillar piece: AI M&E dashboards for MENA NGOs.