The Problem: Agents That Forget
Most AI agents today suffer from the same fundamental limitation: they forget everything the moment a session ends. You can give an agent a 1M-token context window, but that still caps how much it can hold — and cost grows linearly with every query.
This is the gap Cognee v1.0 targets, released on June 26, 2026. Instead of scaling context windows, Cognee builds a self-updating knowledge graph that persists between sessions, compresses what agents know, and retrieves exactly what they need — without burning tokens on information already seen.
What Is Cognee?
Cognee is an open-source AI memory platform for agents. It ingests raw data in any format — text, PDFs, code, structured records — and continuously builds a knowledge graph that gives AI agents persistent long-term memory across sessions.
Version 1.0 introduces a memory-native API centered on four operations:
- remember — store new information
- recall — retrieve contextual information
- improve — re-weight memory based on corrections and feedback
- forget — remove data cleanly when no longer needed
More than 100 companies run Cognee in production today, generating 6 million memories per month. The project has 17.5K GitHub stars after the v1.0 launch, backed by a $7.5M seed round from Pebblebed (with founders from OpenAI and Facebook AI Research).
Architecture: Three Storage Layers, One Memory Engine
Cognee's key architectural insight is that a single storage type — vector-only RAG — leaves too much on the table. V1.0 unifies three storage layers:
| Layer | What it stores | Default backend |
|---|---|---|
| Graph store | Entities, relationships, concepts | PostgreSQL + pgvector |
| Vector store | Semantic embeddings | pgvector / Qdrant / ChromaDB |
| Relational store | Documents, provenance, metadata | PostgreSQL |
All three are queried together at retrieval time. The result: Cognee can answer "what did this user tell me three weeks ago?" with the precision of a graph traversal and the semantic flexibility of vector search.
The architecture also enables auto-routing — Cognee selects the right retrieval strategy (vector similarity, graph traversal, relational lookup, or a hybrid) based on the query type automatically.
Quick Start
Installation is a single command:
pip install cogneeFor PostgreSQL-backed storage (recommended for production):
pip install "cognee[postgres]"Here is a minimal agent memory loop:
import cognee
import asyncio
async def main():
# Store information from the current session
await cognee.remember("User prefers concise responses, no bullet points.")
await cognee.remember(
"Project context: building a CRM for a Tunis-based SaaS.",
session_id="proj_123"
)
# Retrieve across sessions
results = await cognee.recall("What does the user prefer?")
project_ctx = await cognee.recall(
"What project are we working on?",
session_id="proj_123"
)
# Update memory based on feedback
await cognee.improve()
# Clean up when done
await cognee.forget(dataset="main_dataset")
asyncio.run(main())The session_id parameter scopes memories to a specific conversation or project, giving you both persistent and ephemeral memory in the same API.
CLI Usage
Cognee also ships a CLI for quick testing without writing Python:
cognee-cli remember "Your information here"
cognee-cli recall "Your query here"
cognee-cli forget --all
cognee-cli -ui # Launch the web UIMCP Integration
Cognee v1.0 ships with built-in MCP (Model Context Protocol) support. You can expose Cognee as an MCP server that any MCP-compatible agent can read from and write to:
docker pull cognee/cognee-mcp:main
docker run -p 8001:8001 cognee/cognee-mcp:mainThe MCP server supports HTTP, SSE, and stdio transports. Claude Desktop, Claude Code, Cursor, Windsurf, Cline, and any other MCP-aware client can use Cognee as a shared memory backend — one knowledge graph accessible to every agent simultaneously.
For Claude Code specifically, the Cognee plugin is available directly in the marketplace.
Storage Backends
Cognee supports a wide range of backends so you can match your existing infrastructure:
Graph stores: PostgreSQL (default), Neo4j, Amazon Neptune, KuzuDB
Vector stores: pgvector (default), Qdrant, ChromaDB, Weaviate, Milvus, LanceDB
Session stores: Redis
Local development: SQLite, KuzuDB — no server required
For MENA teams with data residency requirements under INPDP (Tunisia) or PDPL (Saudi Arabia), self-hosting with PostgreSQL on your own infrastructure keeps all memories on-premises, with no data leaving the region.
Deployment Options
Cognee v1.0 ships four deployment paths:
- Managed Cloud — hosted at cognee.ai, API key access
- Self-hosted — single PostgreSQL instance, full data ownership
- Edge / on-device — Rust SDK for resource-constrained environments
- Node-based workflows — TypeScript SDK for JavaScript and Next.js stacks
Cloud deployment via Python is a one-liner:
await cognee.serve(url="https://your-instance.cognee.ai", api_key="ck_...")Benchmarks: BEAM Results
Cognee benchmarks against the BEAM (Benchmark for Evaluation of Agent Memory) suite:
| Corpus size | Cognee | Previous SOTA |
|---|---|---|
| 100K tokens | 79% | 73.4% |
| 10M tokens | 67% | 64.1% |
Token usage stays flat as corpus size grows. In contrast, pure long-context approaches see token cost grow linearly with corpus size.
Break-even point: For fewer than 23–26 repeated queries, a large context window is often cheaper. Beyond that threshold, Cognee's persistent memory structure consistently wins on both cost and accuracy.
When Not to Use Persistent Memory
Cognee is infrastructure — it adds operational overhead. Consider skipping it when:
- Your agent runs a single session with no expectation of continuity
- Your corpus is under 50K tokens and fits comfortably in one context window
- You need minimum latency (graph retrieval adds a network round-trip vs in-memory context)
For stateless one-shot tasks, long context still wins. For anything building persistent user understanding, project context, or cross-session reasoning, Cognee closes the gap that RAG alone cannot.
Ecosystem and Migration
Cognee v1.0 supports migrating from existing memory solutions including Mem0, Zep, and Letta. Data portability is handled via the open COGX format, so you are never locked in.
The platform integrates with LangGraph, smolagents, and any framework that supports tool-calling or MCP. For teams building on the Claude Agent SDK, Cognee provides exactly the memory persistence layer that makes persistent teammates genuinely useful across long-running projects.
Conclusion
Cognee v1.0 makes the case that agent memory deserves its own infrastructure layer — not a clever prompt trick or a larger context window, but a proper graph-native engine with open semantics.
With pip install cognee, four simple API methods, built-in MCP support, flexible self-hosting, and clear migration paths from existing solutions, it is now straightforward to give any Python or TypeScript agent the kind of long-term memory that makes it genuinely useful across sessions.
The memory layer is becoming the third critical infrastructure component for production agents — after the model and the execution harness. Cognee v1.0 is the most credible open-source attempt yet to fill that role.