Back to Article List

Advanced memory management in OpenClaw: QMD, graphs, mem0

Advanced memory management in OpenClaw: QMD, graphs, mem0 - Advanced memory management in OpenClaw: QMD, graphs, mem0

If you've been running OpenClaw for a few weeks and started noticing that it "forgets" things it should know, you're not hitting a bug. You're hitting a design constraint. The default memory system is deliberately simple: Markdown files on disk, a local vector index for retrieval, and a context window that compacts older content away when it gets too full.

That's fine when you're starting out. It gets frustrating the moment you have real history to work with and the agent can't connect the dots.

This article covers the three main ways to extend OpenClaw's memory beyond the defaults: QMD (hybrid retrieval that dramatically improves recall accuracy), Cognee (knowledge graph memory that understands relationships, not just similarity), and Mem0 (automatic fact extraction and long-term storage, with cloud and self-hosted modes).

If you haven't read how OpenClaw memory works yet, start there first, since this article assumes you already understand the Markdown layer, daily logs, MEMORY.md, and the basic embedding/retrieval flow.

Why the default system breaks down over time

Here's the honest assessment of what you're working with out of the box...

The built-in system splits memory files into roughly 400-token chunks with 80-token overlap, embeds each chunk, and stores them in a local SQLite-backed index. When you query, it does a semantic search over those chunks and returns the top results. For small amounts of memory this works well. As your workspace grows, a few real problems emerge.

The biggest one is relational: If you wrote "Alice manages the auth team" on a Monday and then on Friday you ask "who handles auth permissions," a pure vector search might surface chunks about Alice and chunks about auth, but it has no way to connect those facts into a coherent answer. The system remembers everything but understands none of it, to borrow a phrase that gets used a lot in discussions about this limitation.

There's also a retrieval quality problem at scale. Pure vector search over hundreds of 400-token chunks produces diminishing returns as your memory grows. Chunks that share vocabulary but aren't conceptually relevant surface alongside the right answers. Chunks that use different terminology than your query don't surface at all, even when they're exactly what you need.

And then there's compaction. During long sessions, OpenClaw has to fit the active context into the model's token window. Older parts of the conversation get summarized or dropped. If you didn't explicitly write something to a memory file before compaction ran, it's gone. The agent behaves as if those exchanges never happened.

The three backends below each address a different part of this problem.

QMD: Hybrid retrieval for better recall

QMD (Query-Memory-Document) is an alternative memory backend that replaces the default search layer while keeping your existing Markdown files as-is. It doesn't require you to change anything about how you write memory. It just makes retrieval significantly better by running multiple search strategies in parallel and merging results.

How QMD works

Instead of a single vector search pass, QMD runs at least two retrieval channels simultaneously over your existing Markdown memory. A keyword channel (BM25 or similar lexical search) handles exact and near-exact term matches well. A vector channel handles semantic similarity. Then a re-ranking step scores the combined candidate set, balancing lexical precision with semantic recall to produce a final ranked list.

In practice, this means that a query like "gateway server setup" can match notes that talk about "running the gateway on the Mac Mini" even if "server" and "setup" don't appear verbatim in that note. It also means that when you search for something very specific, like a port number or a project name, the keyword channel ensures those exact matches aren't buried under semantically adjacent but less relevant results.

Installing QMD

QMD runs as a local sidecar service alongside OpenClaw. The common install route is via Bun:

bun install -g https://github.com/tobi/qmd

Once installed, start it and leave it running near your OpenClaw instance. It listens on a local port and exposes a simple API that OpenClaw's memory backend calls on each retrieval. Everything runs locally, which means your memory files never leave the machine.

Configuring QMD in OpenClaw

Switch to the QMD backend in your agent config:

memory:
  backend: qmd
  citations: auto
  qmd:
    includeDefaultMemory: true
    update:
      interval: 5m
      debounceMs: 15000
      onBoot: true
      waitForBootSync: false
    limits:
      maxResults: 6
      maxSnippetChars: 700
      timeoutMs: 4000
    scope:
      default: deny
      rules:
        - action: allow
          match:
            chatType: direct

A few things worth understanding here. Setting includeDefaultMemory: true keeps all your existing Markdown sources in play, which is what you want. The update.interval and debounceMs settings control how often QMD re-indexes your memory files. A 5-minute interval with a 15-second debounce is a reasonable default, meaning rapid edits don't trigger constant re-embedding.

The scope block is important and easy to miss. With default: deny plus an explicit allow rule for direct chats, QMD only indexes and retrieves from direct message conversations. This keeps noisy group chats out of long-term memory, which is usually what you want unless you're explicitly building a shared memory setup. You can also control the search strategy directly:

memory:
  backend: "qmd"
  qmd:
    searchMode: "query"    # hybrid (default), "search" (keyword only), or "vsearch" (vector only)

query mode is the full hybrid pipeline. search and vsearch are useful when you want to isolate one channel for debugging or when a particular use case benefits from one approach over the other.

When QMD is worth it

If you're just starting out with a fresh workspace and small daily logs, the default SQLite memory is fine. QMD becomes noticeably better once you've accumulated several weeks of notes and start seeing gaps in recall where the agent clearly should have found something. That's the signal to switch.

Knowledge graph memory with Cognee

QMD improves how you find chunks of text. It doesn't help with relational reasoning. For that, you need something that builds an explicit model of entities and their connections. Cognee is an open-source memory engine that does exactly this: it reads your Markdown memory files, extracts entities and relationships, builds a graph, and exposes graph-based search modes that can answer questions by traversing connections rather than just matching vectors.

What a knowledge graph adds

Consider how this plays out with real memory content. You write "Alice manages the auth team" in a memory note on Monday. You write "the auth team owns the permissions service" on Wednesday. On Friday you ask "who's responsible for the permissions service?" A vector search might surface both notes individually but won't connect the chain: Alice → manages → Auth Team → owns → Permissions Service. A graph traversal does.

Cognee's integration with OpenClaw is designed so that Markdown remains your source of truth. The graph runs in the background as an additional retrieval layer. You don't change how you write memory files; Cognee reads them, extracts structure, and adds it to the graph automatically.

How the Cognee plugin works

The plugin operates in three phases that run around each agent session.

On startup, it scans MEMORY.md and the files under memory/*.md in your workspace. New files get added, changed files get updated (using hash-based change detection to avoid redundant processing), and unchanged files are skipped. A sync index lives at ~/.openclaw/memory/cognee/ to track what's already been indexed.

Before each agent run, the plugin sends the current prompt to Cognee. Cognee queries the graph using GRAPH_COMPLETION search (or another configured search type), finds related entities and relationships, and the plugin injects those results as structured context alongside the regular Markdown snippets the base memory system returns.

After each agent run, the plugin scans memory files again for any changes made during the session and updates the graph. New knowledge and relationships written during the conversation are reflected in future queries.

Setting up Cognee for OpenClaw

Cognee runs as a separate server, most commonly via Docker Compose. Once it's running locally, the plugin configuration in ~/.openclaw/config.yaml looks like this:

plugins:
  entries:
    memory-cognee:
      enabled: true
      config:
        baseUrl: "http://localhost:8000"
        apiKey: "${COGNEE_API_KEY}"
        datasetName: "my-project"
        searchType: "GRAPH_COMPLETION"
        autoRecall: true
        autoIndex: true

The datasetName field is important for anyone running multiple projects: use a distinct name per project or workspace so graphs don't mix entities from unrelated contexts. If Alice from one project and Alice from another are different people, you don't want the graph confusing them.

If you want tighter control over what goes into the graph, you can start with autoIndex: false and add documents explicitly. This is useful when you have a large existing memory directory and want to be deliberate about what gets processed first rather than indexing everything at once on boot.

Cognee vs QMD

They solve different problems and can be used together. QMD makes text retrieval better across all your Markdown. Cognee adds relational reasoning on top. If your memory use case is primarily "find notes that are relevant to this query," QMD is the right starting point. If you're running long-term projects where people, teams, systems, and their relationships matter for answers, Cognee is what makes that work.

Mem0: automatic fact extraction and long-term storage

QMD and Cognee both work on top of Markdown files you've already written. Mem0 takes a different approach: it watches conversations, automatically extracts structured facts from them, deduplicates those facts, and stores them in a vector database for later retrieval. You don't have to write anything to MEMORY.md. The system extracts and stores knowledge from the conversation itself.

What Mem0 does differently

After each exchange, Mem0 processes the full conversation transcript with an LLM. It identifies meaningful facts ("user prefers dark mode," "project deadline is March 15," "the API key for service X expires monthly"), deduplicates against what it already has, and stores the result as embeddings. Before the next response, it queries those stored facts for anything relevant to the current message and injects them into the prompt.

This is useful when you're using OpenClaw conversationally across many sessions and don't want to manually curate memory files. It's also useful for per-user memory in multi-user setups, since Mem0 namespaces memories by userId.

Mem0 cloud vs self-hosted

There are two deployment modes. The hosted version uses Mem0's cloud infrastructure. Install the plugin:

openclaw plugins install @mem0/openclaw-mem0

Get an API key from app.mem0.ai, then configure it:

"plugins": {
  "entries": {
    "@mem0/openclaw-mem0": {
      "enabled": true,
      "config": {
        "mem0Url": "https://api.mem0.ai",
        "apiKey": "your-key-here",
        "userId": "your-identifier",
        "autoRecall": true,
        "autoCapture": true
      }
    }
  }
}

If you'd rather keep your memory data local, the self-hosted option runs a FastAPI server backed by ChromaDB. Install the server dependencies:

pip install mem0ai fastapi uvicorn chromadb

Start the server (default port 8080):

python server.py

Then configure OpenClaw to use it instead, with the community self-hosted plugin:

"plugins": {
  "entries": {
    "openclaw-mem0-memory": {
      "enabled": true,
      "config": {
        "mem0Url": "http://localhost:8080",
        "userId": "openclaw_local",
        "autoRecall": true,
        "autoCapture": true,
        "maxRecallResults": 10,
        "profileFrequency": 50,
        "captureMode": "all",
        "debug": false
      }
    }
  }
}

The self-hosted server needs LLM and embedding credentials to do its extraction. Provide those via environment variables in ~/.openclaw/workspace/.env:

OPENAI_API_KEY=your-key
# or ANTHROPIC_API_KEY=your-key, depending on which model you're using for extraction

The profileFrequency setting controls how often Mem0 rebuilds a consolidated profile from stored memories (every 50 captures by default). The captureMode: "all" means every conversation exchange is processed, which is usually what you want, though you can restrict it if you're seeing too much noise being stored.

Commands and tools available with Mem0

The plugin exposes several agent tools and slash commands for explicit memory operations:

  • mem0_store, mem0_search, mem0_forget, mem0_profile as agent tools
  • /remember, /recall as slash commands in chat
  • openclaw mem0 status, openclaw mem0 search, openclaw mem0 wipe as CLI commands

openclaw mem0 wipe is worth knowing about. If auto-capture has stored a lot of irrelevant facts and you want to start fresh without destroying your Markdown memory files, this clears just the Mem0 store.

When Mem0 makes sense

Mem0 is most useful when you want hands-off, automatic long-term storage without maintaining MEMORY.md yourself. It's also the right choice when you need per-user memory namespaces across multiple agents or channels. The trade-off is that you're trusting an LLM extraction step to decide what's worth storing, which means occasionally irrelevant facts get stored and occasionally important ones get missed. The explicit /remember command exists for the cases where you want to be sure something gets captured.

Memory isolation, best practices, and backups

Separate memory per project

Whether you're using the default Markdown system, QMD, Cognee, or Mem0, keeping separate memory per project prevents cross-contamination of facts and improves retrieval precision. In practice this means separate workspaces or at least separate memory directories per major project. For Cognee, use a distinct datasetName per project. For Mem0, use distinct userId values per user or context. The confusion that results from mixing project memory is subtle and hard to debug, the kind where the agent gives answers that are slightly wrong in ways that take a while to trace back to a memory collision.

Control what gets indexed

Don't index everything blindly. In the Markdown system, use MEMORY.md for curated, stable knowledge and let daily logs carry noisy short-term detail. Prune or summarize daily files periodically rather than letting them accumulate indefinitely. In QMD, use scope rules so that group chats and noisy public channels don't end up in long-term memory. In Mem0, tune captureMode and autoCapture if you're seeing too much irrelevant content being stored.

Write important things before compaction

Context-window compaction drops information, and there's no way around the token limit. The practical response is to treat compaction as a forcing function: if something matters, write it to a memory file during the session rather than hoping it survives compaction. Some setups implement nightly "consolidation" workflows where a cron job reads recent session logs, extracts key decisions and facts, and writes summarized versions to MEMORY.md while pruning the raw logs. This mirrors what the Cognee plugin does automatically with autoIndex, but gives you more control over what gets preserved. The cron scheduler guide covers how to set up workflows like this.

Backing up memory

Because Markdown is the primary source of truth in every system described here, the baseline backup is simple: snapshot ~/.openclaw/workspace/ regularly, including MEMORY.md, the memory/ directory, and any USER.md or similar files. For QMD, the indexes can usually be rebuilt from the Markdown source, but backing them up saves re-embedding time on large corpora. For Cognee, export the dataset from its database before migrating or cloning environments. For self-hosted Mem0, back up the ChromaDB directory alongside the config. The backup and export guide covers the mechanics in detail.

Privacy guardrails

Memory often contains sensitive information. A few defaults worth checking: set memorySearch.fallback = "none" if you want to prevent remote embedding backends from processing local documents. Keep Cognee and self-hosted Mem0 services on localhost or a private network; if you're using cloud APIs (Mem0 cloud, remote embeddings), make sure tokens are scoped correctly and secrets aren't stored in plaintext in your config file. OpenClaw's default of loading MEMORY.md only in private sessions, not group contexts, is a sensible baseline worth preserving.

Comparing the three backends

To be direct about when to use what: most people should start with QMD once they outgrow the defaults, since it requires the least setup change and produces the biggest improvement in recall quality for the widest range of use cases. Add Cognee if relational reasoning matters for your projects, specifically when you need the system to connect facts across different memory notes rather than just finding relevant chunks. Use Mem0 when you want automatic extraction from conversations and minimal manual curation, or when you need per-user memory namespaces for multi-user deployments.

None of these are mutually exclusive. QMD and Cognee can be layered. Mem0 can run alongside either. The Markdown files stay as the shared source of truth throughout.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime
Billing Cycle

1 GB RAM VPS

$3.99 Save  50 %
$1.99 Monthly
  • 1 vCPU AMD EPYC
  • 30 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Firewall management
  • Free server monitoring

2 GB RAM VPS

$4.99 Save  20 %
$3.99 Monthly
  • 2 vCPU AMD EPYC
  • 30 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Firewall management
  • Free server monitoring

6 GB RAM VPS

$13.99 Save  29 %
$9.99 Monthly
  • 6 vCPU AMD EPYC
  • 70 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Firewall management
  • Free server monitoring

AMD EPYC VPS.P1

$6.99 Save  29 %
$4.99 Monthly
  • 2 vCPU AMD EPYC
  • 4 GB RAM memory
  • 40 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

AMD EPYC VPS.P2

$12.99 Save  31 %
$8.99 Monthly
  • 2 vCPU AMD EPYC
  • 8 GB RAM memory
  • 80 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

AMD EPYC VPS.P4

$25.99 Save  31 %
$17.99 Monthly
  • 4 vCPU AMD EPYC
  • 16 GB RAM memory
  • 160 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

AMD EPYC VPS.P5

$32.49 Save  29 %
$22.99 Monthly
  • 8 vCPU AMD EPYC
  • 16 GB RAM memory
  • 180 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

AMD EPYC VPS.P6

$48.99 Save  31 %
$33.99 Monthly
  • 8 vCPU AMD EPYC
  • 32 GB RAM memory
  • 200 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

AMD EPYC VPS.P7

$61.99 Save  35 %
$39.99 Monthly
  • 16 vCPU AMD EPYC
  • 32 GB RAM memory
  • 240 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

EPYC Genoa VPS.G1

$4.99 Save  20 %
$3.99 Monthly
  • 1 vCPU AMD EPYC Gen4 AMD EPYC Genoa 4th generation 9xx4 with 3.25 GHz or similar, on Zen 4 architecture.
  • 1 GB DDR5 memory
  • 25 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

EPYC Genoa VPS.G2

$9.99 Save  20 %
$7.99 Monthly
  • 2 vCPU AMD EPYC Gen4 AMD EPYC Genoa 4th generation 9xx4 with 3.25 GHz or similar, on Zen 4 architecture.
  • 4 GB DDR5 memory
  • 50 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

EPYC Genoa VPS.G4

$18.99 Save  32 %
$12.99 Monthly
  • 4 vCPU AMD EPYC Gen4 AMD EPYC Genoa 4th generation 9xx4 with 3.25 GHz or similar, on Zen 4 architecture.
  • 8 GB DDR5 memory
  • 100 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

EPYC Genoa VPS.G5

$29.99 Save  27 %
$21.99 Monthly
  • 4 vCPU AMD EPYC Gen4 AMD EPYC Genoa 4th generation 9xx4 with 3.25 GHz or similar, on Zen 4 architecture.
  • 16 GB DDR5 memory
  • 150 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

EPYC Genoa VPS.G6

$34.99 Save  23 %
$26.99 Monthly
  • 8 vCPU AMD EPYC Gen4 AMD EPYC Genoa 4th generation 9xx4 with 3.25 GHz or similar, on Zen 4 architecture.
  • 16 GB DDR5 memory
  • 200 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

EPYC Genoa VPS.G7

$57.99 Save  26 %
$42.99 Monthly
  • 8 vCPU AMD EPYC Gen4 AMD EPYC Genoa 4th generation 9xx4 with 3.25 GHz or similar, on Zen 4 architecture.
  • 32 GB DDR5 memory
  • 250 GB NVMe storage
  • Unmetered bandwidth
  • IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
  • 1 Gbps network
  • Automatic backup included
  • Firewall management
  • Free server monitoring

FAQ

How do I know if QMD is actually improving my recall?

The easiest test: ask your agent something you know is in your memory files but phrased differently from how it was written. With default memory, queries that don't share vocabulary with the stored text often fail. With QMD hybrid search, they're much more likely to surface the right result. You can also enable debug logging on the memory plugin to see what chunks are being returned before and after switching backends.

Automate faster, for less

Bring your winning ideas to life with AMD power, NVMe speed and unmetered bandwidth. Deploy your VPS in seconds, with a pre-installed OpenClaw template on Ubuntu 24.04.