If your Hermes Agent gateway is fine for a day, then suddenly the box runs out of memory at midnight and the OOM killer takes the gateway down, you've hit the long-running gateway memory leak. It's a real bug in the upstream code, tracked in the Hermes issue tracker. Every VPS deployment that runs more than 24 hours straight will see it eventually.
Below: how to confirm you're hitting this specific leak (vs other memory issues), the mitigations that work today and what to watch for upstream.
Let's go!
Confirm it's the gateway leak (not something else)
Three different memory problems look similar from the outside. Make sure you're chasing the right one.
Sign 1: memory grows linearly with uptime
If you graph RSS over time, the gateway leak produces a steady upward trend. Not flat. Not spiky. Steadily climbing maybe 30-100 MB per hour depending on conversation throughput.
while true; do
ps -o rss= -p $(pgrep -f "hermes gateway") | awk '{print strftime("%H:%M:%S"), $1/1024 "MB"}'
sleep 300
done | tee gateway-memory.log
Let that run for a few hours. Plot it. Linear-up means leak. Flat with occasional bumps means normal operation.
Sign 2: it survives provider switches
Some memory bloat is provider-side (caching response buffers). The gateway leak doesn't care what provider you use. If you switch from Anthropic to OpenRouter mid-session and memory keeps climbing at the same rate, that's the leak.
Sign 3: dmesg shows the OOM killer picking your gateway
dmesg | grep -i "killed process" | grep -i hermes
journalctl -u hermes-gateway --since "24 hours ago" | grep -i "oom\|killed"
If you see a kill entry naming the hermes gateway PID, that confirms a real OOM rather than something else taking it down.
Mitigation 1: scheduled restarts (the boring fix that works)
This is what I run in production. Restart the gateway every 12 hours. The leak never gets bad enough to OOM. Users see at most a few seconds of downtime overlapping with messaging activity.
Restart through systemd
If your gateway runs under systemd (covered in our Hermes Agent systemd setup), add a timer that restarts it:
sudo systemctl edit --force --full hermes-gateway-restart.service
Content:
[Unit]
Description=Restart Hermes Gateway periodically
After=hermes-gateway.service
[Service]
Type=oneshot
ExecStart=/bin/systemctl restart hermes-gateway.service
Then the timer:
sudo systemctl edit --force --full hermes-gateway-restart.timer
[Unit]
Description=Restart Hermes Gateway every 12h
[Timer]
OnBootSec=12h
OnUnitActiveSec=12h
Persistent=true
[Install]
WantedBy=timers.target
sudo systemctl enable --now hermes-gateway-restart.timer
sudo systemctl list-timers --all
You should see the restart timer scheduled. Pick a window that doesn't overlap your busiest hour (mine fires at 4 a.m. local and 4 p.m. local).
Why 12 hours and not 24
The leak rate varies. On a busy bot with messaging gateways running, the gateway can OOM in 20 hours. On a quiet personal install, it goes 60+. 12 hours is the safe upper bound for most setups. Quiet boxes can stretch to 24 if you'd rather have fewer restart blips. Busy boxes need 6.
Mitigation 2: monitoring and alert before OOM
If you'd rather not restart on a fixed schedule, monitor memory and restart only when usage crosses a threshold.
cat > /usr/local/bin/hermes-mem-watchdog.sh << 'EOF'
#!/bin/bash
GATEWAY_PID=$(pgrep -f "hermes gateway" | head -1)
[ -z "$GATEWAY_PID" ] && exit 0
RSS_MB=$(ps -o rss= -p $GATEWAY_PID | awk '{print int($1/1024)}')
THRESHOLD=2048
if [ "$RSS_MB" -gt "$THRESHOLD" ]; then
logger "Hermes gateway at ${RSS_MB}MB, restarting"
systemctl restart hermes-gateway
fi
EOF
chmod +x /usr/local/bin/hermes-mem-watchdog.sh
Run it every 5 minutes from cron or a systemd timer. Threshold 2048 MB is conservative for a 4 GB box. Tune to your total RAM.
The trade-off vs scheduled restart: less downtime on quiet days, slightly more risk of an unexpected restart in the middle of busy traffic.
Mitigation 3: bigger box (the lazy fix)
If the leak rate is 50 MB/hour and you have 16 GB of RAM available to the gateway, you can go a long time before hitting OOM. Not technically a fix but real. On a 2 GB VPS the leak bites in less than a day. On a 16 GB VPS you have a week.
If you're already paying for a small VPS and the gateway is the only thing on it, moving up one tier on LumaDock buys you a lot of breathing room and removes the restart-cadence question for a while. Plans include unmetered bandwidth and no setup fees, so resizing mid-month is painless. Setup details in our Hermes Agent complete guide.
Mitigation 4: reduce conversation history retention
Part of the leak appears to be conversation history accumulating in memory. Aggressive compression keeps the working set smaller.
hermes config set session_max_messages 50
hermes config set session_auto_compress true
Sessions older than 50 messages get summarised and compressed. The compressed summary stays. The original messages get evicted from working memory.
This isn't a full fix (the leak is still there) but it slows the rate. I've seen leak rates drop from 60 MB/hour to 25 MB/hour with this setting on. Worth doing if you can't get to a fix.
What we know about the cause
From the upstream issue thread, the leak appears related to streaming session buffers not being released after the response completes. Specifically, the _drop_trailing_empty_response_scaffolding code path in the message flush pipeline holds references that should be GC-able but aren't. Same general area as the missing-assistant-messages bug we cover in our database is locked piece, but a different specific failure mode.
A fix is in flight in the Hermes repo. Watch the Hermes issues tracker for the specific PR landing. Until then, the restart cadence above is the operational answer.
After the upstream fix lands
Don't immediately rip out your restart timer. New releases sometimes introduce new leaks. Run the patched version with the restart timer still in place for at least a week, monitor the memory graph, confirm it stays flat. Then if you want to remove the timer, fine. I'd leave it on for peace of mind even after the fix, because there's no real cost to a midnight restart.
Logging context that helps if you're filing your own issue
If your leak looks different from what I described (faster than 100 MB/hour, sawtooth pattern rather than linear or only happens with certain providers) you might be hitting a related but separate bug. Capture:
- Hermes version (
hermes --version) - Provider, model, gateway channels in use
- Average messages per hour
- The memory graph from sign 1 above
- Last 200 lines of
~/.hermes/logs/gateway.log
File it on the GitHub repo. Upstream maintainers usually triage memory issues quickly because they hurt every VPS deployment.
Where this fits with the rest of the stack
Memory monitoring should sit alongside your existing health checks. If you've got Prometheus + Grafana already, the gateway RSS metric is a useful one to chart. If you don't, the watchdog script above is enough for a single VPS. Production hardening more broadly lives in our Hermes production hardening checklist.

