Hermes Agent memory: SOUL.md, MEMORY.md and state.db

Teodor Tudor

12/05/2026

Hermes Agent memory: SOUL.md, MEMORY.md and state.db

One of the things that makes Hermes feel different from a stock LLM chat is that it remembers you. Not in a buzzy "AI with personality" way; in a flat, mechanical, file-on-disk way. Your persona lives in a markdown file. Your facts live in another markdown file. Your conversation history lives in a SQLite database. Each one is a thing you can read, edit and delete with normal command-line tools, which means the memory layer is debuggable in a way that "just trust the LLM" memory systems aren't.

This article walks through what each layer does, when each one fires, what it costs in tokens and how to edit them by hand without breaking things.

The three layers

Hermes splits memory across three places, each with a different lifetime, granularity and intended purpose.

Layer 1: persona and frozen facts. Markdown files at ~/.hermes/SOUL.md and ~/.hermes/memories/MEMORY.md and ~/.hermes/memories/USER.md. These get loaded into the system prompt at the start of every session. Stable, slow-changing, hand-curated. The agent doesn't write to these by default; you do.

Tokens spent: typically 1500 to 5000 tokens, depending on how much you've put in. Worth being judicious because every session pays this cost.

Layer 2: skills. The ~/.hermes/skills/ directory. Each skill is a folder with a SKILL.md file describing when to invoke it and what to do. Hermes auto-creates skills via the learning loop (after a few successful runs of similar tasks) and the skills then get used automatically when the agent encounters similar tasks again.

Tokens spent: zero per session unless the skill is invoked, in which case you pay for the SKILL.md content as part of the active context for the duration of that skill's task.

Layer 3: session search. A SQLite database at ~/.hermes/state.db holding every message ever sent in any session, with FTS5 full-text search on top. The agent can call a session_search tool to find relevant past conversations.

Tokens spent: zero per session unless the agent decides to query it. Typical queries pull a few hundred to a few thousand tokens of relevant past context.

Layer 1, in detail

SOUL.md is the persona file. It defines how the agent talks: tone, register, idioms it uses, things it cares about. Edit it as plain markdown. The format is loose; you can write it as bullet points, prose, dialogue examples, whatever feels right. Hermes loads it whole into the system prompt.

A reasonable SOUL.md for a personal agent:

# Persona

You are Alice's assistant. You are concise, direct and slightly dry.
You don't apologise unless you've done something wrong.
You don't use emoji.
You match the user's tone: terse with terse messages, more conversational
when the user is being conversational.

# Boundaries

If the user asks for help with something destructive (deleting files,
sending money, posting publicly), you confirm before acting.
You don't pretend to be a human.

MEMORY.md is the agent's working knowledge. It's where the agent (or you) writes down stable facts that should be available every session. Things like project context, ongoing themes, preferences.

# Memory

## Projects
- Working on a Next.js SaaS app called BrightCart, deployed on Coolify.
- The codebase is at ~/code/brightcart, main branch is main.
- Stripe is the payment provider. Webhooks point at /api/stripe-webhook.

## Preferences
- Prefer Postgres over MySQL for new projects.
- Use ripgrep over grep for searching code.
- Replies should default to short unless I explicitly ask for detail.

USER.md is parallel to MEMORY.md but more strictly factual: things specifically about the user (your name, your timezone, your work hours, things the agent should know about you as a person rather than about your projects).

# User

Name: Alice Chen
Timezone: Europe/London
Work hours: Mon-Fri 09:00-18:00
Important: Has a 14-month-old, so don't suggest "let's catch up at 8am Saturday".

The split between MEMORY.md and USER.md is a soft convention; both get loaded in the same way. Use the split or don't, depending on how much you have to remember and how you like to organise it.

How auto-write works (and how to control it)

When you tell the agent something it should remember, it can write to MEMORY.md or USER.md on its own. The trigger phrase is anything like "remember that...", "make a note...", "next time you see X, do Y...". The agent appends to the relevant file with a timestamp comment.

If you don't want the agent writing to memory automatically:

hermes config set memory.auto_write false

Then the agent tells you what it would have written and you can copy-paste into the file by hand if you want. Useful when you want tighter control over what ends up in the persistent context.

If the agent writes too much (you turn around and the file has grown 100 lines in a week), prune. Open the file, delete entries that aren't useful any more, save. The agent picks up the pruned version on next session start.

Layer 2, the skills system

Skills are the mid-grained memory layer. They capture procedures, not facts. "How do I deploy a Next.js app to Coolify" is a skill; "BrightCart deploys to Coolify" is a memory.

The ~/.hermes/skills/ directory has one folder per skill, each containing a SKILL.md plus optional supporting files (templates, scripts, lookup tables). The SKILL.md frontmatter declares when the skill applies; the body describes what to do.

The learning loop generates skills automatically from successful task patterns. After three or four successful runs of similar work, Hermes generates a skill that captures the procedure. You can read it, edit it, delete it. Editing a generated skill is fine, but Hermes may regenerate over your edits if it sees similar successful runs again. To prevent that, mark the skill as user-locked in the frontmatter; the skills article covers the lock pattern in detail.

Skills cost tokens only when invoked. The agent decides per task which skills are relevant; only relevant skills get loaded into the active context for that task.

Layer 3, the session DB

state.db is where every message ever sent through Hermes lives. Each row is a message: who sent it, when, on what channel, with what content. FTS5 full-text indexes the content for efficient search.

The agent doesn't load any of this by default. It queries when it decides search is useful: "did I ever discuss X" or "what did Alice say about Y last month" type prompts trigger the agent to call session_search internally, which queries state.db and returns matching rows.

You can query state.db directly with sqlite3:

sqlite3 ~/.hermes/state.db "SELECT created_at, channel, content FROM messages WHERE content MATCH 'BrightCart' ORDER BY created_at DESC LIMIT 10"

This returns the last ten messages mentioning BrightCart, regardless of which channel they came in on. Useful when you're trying to remember what you previously told the agent about a project.

The DB grows over time. A few KB per turn plus the FTS5 indexes; over six months of normal use, expect 20 to 100 MB. Manageable on any reasonable VPS. If you want to prune, the backups guide covers the SQL to delete old rows safely.

The reflective phase

Periodically, Hermes runs a reflection pass that synthesises across stored memory: it reads recent sessions, distills patterns and writes summaries back into MEMORY.md or generates new skills.

The trigger conditions are heuristic; the user-visible effect is that after a heavy day of conversation, your MEMORY.md may have a new entry capturing the gist of yesterday's work. The agent has had time to think about what mattered and what didn't.

If you don't want this:

hermes config set memory.reflection_enabled false

The agent skips reflection passes. Memories accumulate only when you (or the agent in active conversation) explicitly note them.

I leave it on. The reflection pass occasionally generates entries I find useful and the cost (a small batch of LLM calls per day) is low.

What the agent sees on each session start

When you open a fresh chat, Hermes builds the system prompt from:

The agent's core instructions (tool definitions, behavioural rules; ~10K tokens, fixed).

SOUL.md (~500 to 2000 tokens, your persona).

MEMORY.md and USER.md (~500 to 3000 tokens, your facts).

An auto-generated context block summarising recent sessions (~500 to 1500 tokens, generated each session).

Your current message.

The total fixed overhead per turn is around 12K to 16K tokens before the agent does anything. The token-cost article goes deeper on what eats tokens and how to trim.

Editing memory by hand without breaking things

Three rules.

Don't edit while the agent is mid-conversation. The agent has the file's content in its current context; editing the file doesn't change what the agent already loaded. Wait for an idle moment, edit, then start a new session.

Keep the format consistent. The agent doesn't strictly parse MEMORY.md or USER.md, so technically any markdown works, but heading structure helps both you and the agent navigate the file. Stick to # Top-level, ## Section, ### Sub-section.

Don't bury timestamps. The auto-write feature adds  comments to entries; keep them. Useful for "what did I add to memory last week" and for the reflective phase, which uses timestamps to decide which entries are still relevant.

If you accidentally corrupt a memory file (deleted a section you didn't mean to, mangled the format), restore from your last backup. The backups guide covers selective file restore.

Memory across migrations

If you migrate from OpenClaw, the migration tool brings SOUL.md, MEMORY.md and USER.md across cleanly. The session DB doesn't migrate (OpenClaw uses a different format), so your conversation history starts fresh on Hermes; the persona and facts carry over but the past chats don't. Practically, that's fine because most of what's useful from past chats has already been distilled into MEMORY.md anyway.

The migration walkthrough covers the full mapping.

Multi-user memory considerations

For a personal agent, MEMORY.md is your memory. For a multi-user agent (a team Slack bot), the agent needs to maintain separate memory per user. Hermes does this via per-user memory files under ~/.hermes/memories/users/<user-id>/USER.md; each user gets their own facts file, separated by user identity from the messaging gateway.

The shared MEMORY.md is still global (project context, team-wide facts), but USER.md splits per user. The agent picks the right user file based on which channel and identity the message came in on.

The shortcut

The LumaDock Hermes Agent VPS template creates the memory directory structure as part of install, with sensible defaults and example SOUL.md and MEMORY.md scaffolds you can edit. Saves the "where do these files live again" lookup the first time you set up.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

مدة الإشتراك

1 GB RAM VPS

36.89 kr Save 25 %

27.64 _kr شهري

1 vCPU AMD EPYC
30 GB NVMe تخزين
✔نطاق ترددي غير محدود
✔ IPv4 و IPv6 مضمّنان دعم IPv6 غير متوفر حالياً في فرنسا، فنلندا أو هولندا.
✔1 Gbps شبكة
✔إدارة جدار الحماية
✔مراقبة مجانية

Hermes Agent memory: SOUL.md, MEMORY.md and state.db

The three layers

Layer 1, in detail

How auto-write works (and how to control it)

Layer 2, the skills system

Layer 3, the session DB

The reflective phase

What the agent sees on each session start

Editing memory by hand without breaking things

Memory across migrations

Multi-user memory considerations

The shortcut

Your idea deserves better hosting

1 GB RAM VPS

2 GB RAM VPS

4 GB RAM VPS

6 GB RAM VPS

AMD EPYC VPS.P1

AMD EPYC VPS.P2

AMD EPYC VPS.P3

AMD EPYC VPS.P4

AMD EPYC VPS.P5

AMD EPYC VPS.P6

AMD EPYC VPS.P7

EPYC Genoa VPS.G1

EPYC Genoa VPS.G2

EPYC Genoa VPS.G3

EPYC Genoa VPS.G4

EPYC Genoa VPS.G6

EPYC Genoa VPS.G7

1 vCPU AMD Ryzen 9

2 vCPU AMD Ryzen 9

4 vCPU AMD Ryzen 9

8 vCPU AMD Ryzen 9

FAQ

How do I see exactly what's in the agent's context for the current session?

How do I make the agent forget a specific embarrassing thing it remembers?

How do I share memory between two Hermes installs without making one a clone of the other?

How do I tell which memory layer the agent is using when it answers a question?

Your agent runs wild. Your bill doesn't.

المنتجات

استضافة التطبيقات

المميزات

الموارد

حلول حسب الاستخدام

احصل على المساعدة

الشركة

إنشاء كلمة مرور