Graph Memory for AI Agents: Running Mem0 Entirely Local

Last week I wrote about hybrid search with LanceDB — BM25 plus vector embeddings, fused with RRF. That covers retrieval: given a query, find the most relevant facts fast.

But there’s a layer above retrieval that vector stores don’t handle: relationships between facts.

A vector store can tell you that “the inference stack runs on port 8081” and “the inference stack uses Vulkan GPU” are both relevant to a query about inference. What it can’t tell you is that those two facts are about the same thing, or that a third fact (“the model router sits in front of it”) forms a chain. That relationship lives only in the LLM’s ability to reason across separate retrievals — not in the memory system itself.

Graph memory fixes this. And Mem0 is the most practical way I’ve found to add it to a local setup.

Why Not Just LanceDB?

LanceDB is excellent for what it does: fast approximate nearest-neighbor search with BM25 fusion. I’m keeping it. But the two systems solve different problems:

LanceDB: “Find me the most relevant stored facts for this query” — retrieval speed, semantic similarity
Mem0: “Build a knowledge graph of how facts relate to each other” — entity extraction, relationship traversal

They’re complementary. LanceDB handles the retrieval layer; Mem0 adds the relationship layer on top. In practice: LanceDB answers “what do I know about X?” and Mem0 answers “how does X connect to Y and Z?”

What Mem0 Actually Does

Mem0 isn’t just a vector database with a nicer API. It processes memories through an LLM before storing them, extracting structured facts and building a graph of relationships alongside the vector embeddings. When you search, you get both: semantically similar memories (from the vector index) and related nodes (from the graph).

The result: an agent that doesn’t just find the most similar thing you told it, but understands how facts connect.

A memory about “service A runs on port X” and a memory about “service A is the primary inference stack” can be traversed as a graph path. Ask about inference and you get both nodes, with the relationship that links them.

The Stack

Running Mem0 entirely locally requires three components:

PostgreSQL + pgvector — stores the vector embeddings. pgvector extends Postgres with a vector type and ANN index. Mem0 uses 768-dimensional vectors (matching nomic-embed-text).

Neo4j — the graph database. Mem0 creates nodes for entities and edges for relationships as it processes each memory. When an agent learns about “server X”, Neo4j creates a node; when it learns that server X runs service Y, that becomes an edge connecting the two nodes.

Mem0 API — FastAPI service that orchestrates both. It receives a natural language memory, calls an LLM to extract entities and relationships, calls an embedding model to vectorize, then writes to both databases. Query side does the same in reverse: vector search + graph traversal + merge.

The Models

This is where most guides fall apart: they assume you’re using OpenAI.

My setup uses:

LLM for memory extraction: a local ~7B instruction model — this processes “the server runs on Vulkan” and extracts {entity: "server", relationship: "runs_on", value: "Vulkan"}
Embeddings: nomic-embed-text-v1.5 (768 dimensions) — converts text to vectors for semantic search

Both run on local hardware. Everything stays local.

What It Looks Like Running

After some time in operation, here’s the kind of knowledge graph that builds up:

Infrastructure facts: which services run where, what depends on what
Configuration: port assignments, model names, hardware specs
Preferences: communication style, tool choices, workflow patterns
Projects: relationships between codebases, their purposes, their status

The graph connects these. A project links to its tech stack, which links to the hardware it runs on, which links to the configuration decisions made for it. A flat vector store would retrieve each of these independently; the graph traverses the chain.

The Reality Check

It’s genuinely useful — but with caveats.

What works well: Entity extraction, relationship building, basic graph traversal. When memories share entities (the same service, project, person), the graph correctly connects them.

What’s fragile: The extraction quality depends entirely on the LLM. A 7B model does a decent job but occasionally misses relationships or creates redundant nodes. Larger models extract better graphs. This is a compute/quality tradeoff.

What’s missing: Temporal awareness. Mem0 stores when memories were created but doesn’t model the aging of information or flag when facts might be outdated. Two conflicting facts about the same entity coexist without conflict resolution.

Setup in 10 Minutes

# docker-compose.yml (abbreviated)
services:
  mem0:
    image: mem0/mem0:latest
    ports: ["8888:8000"]
    environment:
      LLM_PROVIDER: openai        # yes, even for local — it's OpenAI-compatible
      LLM_BASE_URL: http://host.docker.internal:11434/v1   # your local LLM endpoint
      LLM_MODEL: your-model-name
      EMBEDDER_PROVIDER: openai
      EMBEDDER_BASE_URL: http://host.docker.internal:11434/v1
      EMBEDDER_MODEL: nomic-embed-text
      EMBEDDER_DIMS: 768
  neo4j:
    image: neo4j:5.26.4
    environment:
      NEO4J_AUTH: neo4j/yourpassword
    ports: ["7474:7474", "7687:7687"]
  postgres:
    image: ankane/pgvector:v0.5.1
    environment:
      POSTGRES_PASSWORD: yourpassword

The key trick: everything uses host.docker.internal to reach services on the host. Mem0’s container doesn’t know it’s talking to local models instead of OpenAI — it’s the same API shape.

After docker compose up -d, add memories:

curl -X POST http://localhost:8888/memories \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "I prefer dark mode in all tools"}], "user_id": "user1"}'

Search them:

curl -X POST http://localhost:8888/search \
  -H "Content-Type: application/json" \
  -d '{"query": "UI preferences", "user_id": "user1"}'

Is It Worth It?

For a personal AI setup: yes, if you care about relationship-aware memory. The graph layer adds real value for any domain where facts connect — personal context, infrastructure knowledge, project tracking.

For production agentic systems: not yet. The extraction quality and temporal modeling aren’t there. You’d want this as one layer in a hybrid system (alongside something like LanceDB for retrieval), not the whole thing.

The privacy argument is stronger than the capability argument right now. Running this stack means your agent’s complete memory — everything it knows about you — stays on your hardware. No API call carrying your personal context to a cloud embedding endpoint. No usage data. No training signal.

That’s worth something even before the graph beats the vector.

Stack: Mem0 + Neo4j 5.26.4 + PostgreSQL/pgvector + nomic-embed-text-v1.5 + local LLM. Docker Compose. 100% local.