Cognee v1.0: Add Persistent Memory to AI Agents

The Problem: Agents That Forget

Most AI agents today suffer from the same fundamental limitation: they forget everything the moment a session ends. You can give an agent a 1M-token context window, but that still caps how much it can hold — and cost grows linearly with every query.

This is the gap Cognee v1.0 targets, released on June 26, 2026. Instead of scaling context windows, Cognee builds a self-updating knowledge graph that persists between sessions, compresses what agents know, and retrieves exactly what they need — without burning tokens on information already seen.

What Is Cognee?

Cognee is an open-source AI memory platform for agents. It ingests raw data in any format — text, PDFs, code, structured records — and continuously builds a knowledge graph that gives AI agents persistent long-term memory across sessions.

Version 1.0 introduces a memory-native API centered on four operations:

remember — store new information
recall — retrieve contextual information
improve — re-weight memory based on corrections and feedback
forget — remove data cleanly when no longer needed

More than 100 companies run Cognee in production today, generating 6 million memories per month. The project has 17.5K GitHub stars after the v1.0 launch, backed by a $7.5M seed round from Pebblebed (with founders from OpenAI and Facebook AI Research).

Architecture: Three Storage Layers, One Memory Engine

Cognee's key architectural insight is that a single storage type — vector-only RAG — leaves too much on the table. V1.0 unifies three storage layers:

Layer	What it stores	Default backend
Graph store	Entities, relationships, concepts	PostgreSQL + pgvector
Vector store	Semantic embeddings	pgvector / Qdrant / ChromaDB
Relational store	Documents, provenance, metadata	PostgreSQL

All three are queried together at retrieval time. The result: Cognee can answer "what did this user tell me three weeks ago?" with the precision of a graph traversal and the semantic flexibility of vector search.

The architecture also enables auto-routing — Cognee selects the right retrieval strategy (vector similarity, graph traversal, relational lookup, or a hybrid) based on the query type automatically.

Quick Start

Installation is a single command:

pip install cognee

For PostgreSQL-backed storage (recommended for production):

pip install "cognee[postgres]"

Here is a minimal agent memory loop:

import cognee
import asyncio
 
async def main():
    # Store information from the current session
    await cognee.remember("User prefers concise responses, no bullet points.")
    await cognee.remember(
        "Project context: building a CRM for a Tunis-based SaaS.",
        session_id="proj_123"
    )
 
    # Retrieve across sessions
    results = await cognee.recall("What does the user prefer?")
    project_ctx = await cognee.recall(
        "What project are we working on?",
        session_id="proj_123"
    )
 
    # Update memory based on feedback
    await cognee.improve()
 
    # Clean up when done
    await cognee.forget(dataset="main_dataset")
 
asyncio.run(main())

The session_id parameter scopes memories to a specific conversation or project, giving you both persistent and ephemeral memory in the same API.

CLI Usage

Cognee also ships a CLI for quick testing without writing Python:

cognee-cli remember "Your information here"
cognee-cli recall "Your query here"
cognee-cli forget --all
cognee-cli -ui   # Launch the web UI

MCP Integration

Cognee v1.0 ships with built-in MCP (Model Context Protocol) support. You can expose Cognee as an MCP server that any MCP-compatible agent can read from and write to:

docker pull cognee/cognee-mcp:main
docker run -p 8001:8001 cognee/cognee-mcp:main

The MCP server supports HTTP, SSE, and stdio transports. Claude Desktop, Claude Code, Cursor, Windsurf, Cline, and any other MCP-aware client can use Cognee as a shared memory backend — one knowledge graph accessible to every agent simultaneously.

For Claude Code specifically, the Cognee plugin is available directly in the marketplace.

Storage Backends

Cognee supports a wide range of backends so you can match your existing infrastructure:

Graph stores: PostgreSQL (default), Neo4j, Amazon Neptune, KuzuDB

Vector stores: pgvector (default), Qdrant, ChromaDB, Weaviate, Milvus, LanceDB

Session stores: Redis

Local development: SQLite, KuzuDB — no server required

For MENA teams with data residency requirements under INPDP (Tunisia) or PDPL (Saudi Arabia), self-hosting with PostgreSQL on your own infrastructure keeps all memories on-premises, with no data leaving the region.

Deployment Options

Cognee v1.0 ships four deployment paths:

Managed Cloud — hosted at cognee.ai, API key access
Self-hosted — single PostgreSQL instance, full data ownership
Edge / on-device — Rust SDK for resource-constrained environments
Node-based workflows — TypeScript SDK for JavaScript and Next.js stacks

Cloud deployment via Python is a one-liner:

await cognee.serve(url="https://your-instance.cognee.ai", api_key="ck_...")

Benchmarks: BEAM Results

Cognee benchmarks against the BEAM (Benchmark for Evaluation of Agent Memory) suite:

Corpus size	Cognee	Previous SOTA
100K tokens	79%	73.4%
10M tokens	67%	64.1%

Token usage stays flat as corpus size grows. In contrast, pure long-context approaches see token cost grow linearly with corpus size.

Break-even point: For fewer than 23–26 repeated queries, a large context window is often cheaper. Beyond that threshold, Cognee's persistent memory structure consistently wins on both cost and accuracy.

When Not to Use Persistent Memory

Cognee is infrastructure — it adds operational overhead. Consider skipping it when:

Your agent runs a single session with no expectation of continuity
Your corpus is under 50K tokens and fits comfortably in one context window
You need minimum latency (graph retrieval adds a network round-trip vs in-memory context)

For stateless one-shot tasks, long context still wins. For anything building persistent user understanding, project context, or cross-session reasoning, Cognee closes the gap that RAG alone cannot.

Ecosystem and Migration

Cognee v1.0 supports migrating from existing memory solutions including Mem0, Zep, and Letta. Data portability is handled via the open COGX format, so you are never locked in.

The platform integrates with LangGraph, smolagents, and any framework that supports tool-calling or MCP. For teams building on the Claude Agent SDK, Cognee provides exactly the memory persistence layer that makes persistent teammates genuinely useful across long-running projects.

Conclusion

Cognee v1.0 makes the case that agent memory deserves its own infrastructure layer — not a clever prompt trick or a larger context window, but a proper graph-native engine with open semantics.

With pip install cognee, four simple API methods, built-in MCP support, flexible self-hosting, and clear migration paths from existing solutions, it is now straightforward to give any Python or TypeScript agent the kind of long-term memory that makes it genuinely useful across sessions.

The memory layer is becoming the third critical infrastructure component for production agents — after the model and the execution harness. Cognee v1.0 is the most credible open-source attempt yet to fill that role.