Running OpenClaw on a single VPS with systemd covers a lot of ground. It works fine for personal use, a handful of channels, and light automation. But production workloads hit the ceiling faster than you'd expect: one gateway handling concurrent sessions, heavy cron loads, large memory indexing, and multiple channel adapters at the same time eventually runs into CPU, RAM, or reliability limits. When that happens, you have two choices.... throw more hardware at it, or rethink the architecture.
This article covers the second option: containerizing OpenClaw with Docker, deploying it to Kubernetes for true high availability, and managing the stateful storage that makes multi-instance setups actually work. It builds on what's covered in hosting OpenClaw securely on a VPS and the systemd service setup, so the assumption is you already have a working single-instance deployment and you're looking to go further.
When does scaling make sense?
Not every OpenClaw setup needs this. Most personal deployments and small team setups run fine on a single server, and adding orchestration complexity without a real reason to is just adding maintenance burden. Here are the signals that it's time:
- CPU spikes above 80% during concurrent sessions. If your gateway is pegging the CPU whenever two or three sessions are active simultaneously, a single instance won't grow into the load gracefully.
- OOM kills in system logs. Large tool chains, memory indexing with QMD or Cognee, and multi-agent routing can push RAM consumption high. If the OS is killing your gateway process, you're either undersized or need to distribute the load.
- More than 10 simultaneous sessions, or heavy cron/heartbeat concurrency. One gateway handles session isolation via its internal lane system, but there are limits to what a single process can manage without degrading response times.
- Zero-downtime requirements. Updating a single-instance systemd service means a brief outage. If your channels need to stay up continuously, you need replicas and rolling deployments.
- Processing high-volume webhooks or API integrations. If you're routing hundreds of inbound events per hour, a single gateway becomes a bottleneck.
If none of these apply, the VPS monitoring guide is probably a better next read than this one.
Containerizing OpenClaw with Docker
Writing a production Dockerfile
OpenClaw has official images (openclaw/openclaw:latest), but if you want control over what's in the image — specific apt packages, a non-root user, custom build steps — you'll want your own Dockerfile. A production-ready starting point based on Node 22 slim:
FROM node:22-bookworm-slim
RUN corepack enable && \
curl -fsSL https://bun.sh/install | bash && \
mv /root/.bun/bin/bun /usr/local/bin/
WORKDIR /app
# Cache dependencies before copying source (faster rebuilds)
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml .npmrc ./
COPY ui/package.json ./ui/
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm build && pnpm ui:build
ENV NODE_ENV=production
EXPOSE 18789 18793
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD node dist/index.js health || exit 1
CMD ["node", "dist/index.js"]
A few things worth adjusting for your specific setup:
- If you need
ffmpegor native build tools for certain skills, addRUN apt-get update && apt-get install -y ffmpeg build-essentialbefore theCOPYsteps so it gets cached separately. - Switch to a non-root user with
USER nodebefore theCMDline and make sure your volume mount points are chowned accordingly. Running as root in a container isn't the end of the world on a private VPS, but it's not good practice when you have channels connected to external services. - Alpine-based images are smaller but sometimes cause issues with native Node modules. Slim bookworm is the safer choice for a production gateway.
Docker Compose for a multi-service setup
In most deployments you'll want at least three services: the gateway itself, a management container for running admin commands, and a reverse proxy. Here's a compose file that covers that:
version: '3.8'
services:
openclaw-gateway:
image: openclaw/openclaw:latest
container_name: openclaw-gateway
restart: unless-stopped
ports:
- "18789:18789"
- "18793:18793"
volumes:
- openclaw-config:/root/.openclaw
- openclaw-workspace:/root/workspace
environment:
- NODE_ENV=production
- OPENCLAW_STATE_DIR=/root/.openclaw
command: openclaw gateway
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
healthcheck:
test: ["CMD", "node", "dist/index.js", "health"]
interval: 30s
timeout: 10s
retries: 3
openclaw-cli:
image: openclaw/openclaw:latest
volumes_from: [openclaw-gateway]
entrypoint: openclaw
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- certs:/etc/nginx/certs
depends_on:
- openclaw-gateway
volumes:
openclaw-config:
openclaw-workspace:
certs:
The two named volumes are the critical piece here. openclaw-config holds everything in ~/.openclaw: your config, credentials, sessions, and cron/jobs.json. openclaw-workspace holds MEMORY.md, the memory/ directory, tools, and skills. Both need to persist across container restarts — if you use ephemeral storage, you lose sessions and memory on every redeploy.
The openclaw-cli service shares volumes with the gateway and gives you a clean way to run management commands without exec-ing into the running gateway:
docker compose run --rm openclaw-cli channels login
docker compose run --rm openclaw-cli status --all
docker compose exec openclaw-gateway openclaw status
For channel setup tasks that require a TTY (WhatsApp QR scanning, Telegram login), use docker compose run -it --rm openclaw-cli channels login.
Agent sandboxing
OpenClaw's sandbox mode runs tools and sessions in isolated sub-containers coordinated by the host gateway. You can configure this with agents.defaults.sandbox.mode: "non-main" (sandboxes everything except the main agent) or "all". The sandbox containers mount a /workspace directory and by default run with network: none, with opt-in egress for tools that need external access. Idle sandboxes are pruned automatically after 24 hours, and aged-out ones after 7 days. This is worth enabling in production because it limits blast radius if a tool chain behaves unexpectedly.
Deploying to Kubernetes
Docker Compose is fine for a single host. Kubernetes is what you want when you need multiple replicas, automatic failover, rolling deployments, and horizontal autoscaling. The trade-off is real: Kubernetes has genuine operational complexity and isn't worth it for setups that don't need what it provides. If you're running OpenClaw on a LumaDock VPS and a single well-resourced instance is enough, stay there. If you're running a production deployment with uptime SLAs, read on.
Secrets and namespace
Start with a namespace to keep OpenClaw resources isolated, and put all sensitive values in a Secret rather than ConfigMaps or environment variables in manifests:
apiVersion: v1
kind: Namespace
metadata:
name: openclaw
---
apiVersion: v1
kind: Secret
metadata:
name: openclaw-secrets
namespace: openclaw
type: Opaque
data:
OPENCLAW_TELEGRAM_TOKEN: <base64-encoded>
ANTHROPIC_API_KEY: <base64-encoded>
DISCORD_BOT_TOKEN: <base64-encoded>
Generate base64 values with echo -n 'your-value' | base64. In a real production cluster, use something like External Secrets Operator to pull from a proper secret store rather than baking values directly into manifests.
Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
replicas: 3
selector:
matchLabels:
app: openclaw-gateway
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: openclaw-gateway
spec:
containers:
- name: gateway
image: openclaw/openclaw:latest
ports:
- containerPort: 18789
envFrom:
- secretRef:
name: openclaw-secrets
volumeMounts:
- name: config
mountPath: /root/.openclaw
- name: workspace
mountPath: /root/workspace
resources:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "1"
memory: "1Gi"
livenessProbe:
httpGet:
path: /health
port: 18789
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 18789
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
persistentVolumeClaim:
claimName: openclaw-config-pvc
- name: workspace
persistentVolumeClaim:
claimName: openclaw-workspace-pvc
Three replicas is a reasonable starting point for HA: it tolerates one pod going down (for maintenance or a crash) while still keeping two active. The liveness probe restarts the container if the gateway stops responding; the readiness probe removes a pod from the load balancer rotation until it's actually ready to serve traffic.
Persistent volumes
This is where multi-replica OpenClaw deployments get interesting. The config and workspace volumes need ReadWriteMany access mode so multiple pods can mount them simultaneously. Not all storage providers support RWX — local-path provisioner doesn't, AWS EBS doesn't, but NFS, CephFS, and Longhorn do. If your cluster doesn't have an RWX-capable storage class, NFS is the easiest option to add:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-config-pvc
namespace: openclaw
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-client
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-workspace-pvc
namespace: openclaw
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-client
resources:
requests:
storage: 20Gi
If you're running advanced memory backends (QMD, Cognee, Mem0) as discussed in the advanced memory management guide, those services should each get their own PVCs and ideally run as separate deployments rather than being bundled with the gateway. The gateways then connect to them via internal cluster DNS, which keeps storage concerns separated and makes individual scaling cleaner.
Service and Ingress
apiVersion: v1
kind: Service
metadata:
name: openclaw-service
namespace: openclaw
spec:
selector:
app: openclaw-gateway
ports:
- name: gateway
port: 18789
targetPort: 18789
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openclaw-ingress
namespace: openclaw
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
rules:
- host: openclaw.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openclaw-service
port:
number: 18789
The extended proxy timeouts matter because the gateway uses WebSockets for some channel communication, and default nginx timeouts of 60 seconds will drop those connections. Set them to at least 3600 seconds (one hour).
Horizontal Pod Autoscaler
Once you have resource requests and limits defined in your deployment, HPA can scale replicas up and down based on CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: openclaw-hpa
namespace: openclaw
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: openclaw-gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
70% CPU utilization as the scale-up trigger is conservative, which is intentional — you want to add capacity before you're already saturated, not after. The minimum of 2 replicas ensures you always have a fallback if one pod dies.
Pod Disruption Budget
A PDB prevents Kubernetes from taking down too many replicas at once during node maintenance or cluster upgrades:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: openclaw-pdb
namespace: openclaw
spec:
minAvailable: 2
selector:
matchLabels:
app: openclaw-gateway
With minAvailable: 2 and three replicas, Kubernetes will never voluntarily take down more than one pod at a time, which means your deployment stays available throughout cluster maintenance windows.
Managing state across replicas
State is the hardest part of running multiple OpenClaw gateway instances. The gateway is not stateless — it maintains sessions, cron job state in jobs.json, channel credentials, and memory. When you run three replicas all writing to the same shared NFS volume, you need to think about what happens when two of them try to write the same file simultaneously.
The honest answer is that OpenClaw's current architecture is designed around a single-gateway model. Shared storage via RWX PVs works reasonably well for config and credentials (which are mostly read, rarely written) and for workspace memory files (which are written sequentially by individual sessions). The area that needs more care is cron/jobs.json: with multiple gateways running, you can get duplicate cron job executions if all replicas are watching the same jobs file and each independently schedules runs.
The practical mitigation for now is one of these two approaches. Designate one replica as the cron leader — configure only one pod to have cron enabled (OPENCLAW_SKIP_CRON=0) and set the others to skip it (OPENCLAW_SKIP_CRON=1). This is simpler than it sounds: use a separate Kubernetes Deployment for the cron-leader pod with that env var set differently. The other replicas handle channel traffic and sessions while one handles scheduling. Alternatively, use a Redis sidecar or external lock service for distributed cron coordination, which is more complex but cleaner at higher scales.
For memory backends like QMD, Cognee, and Mem0 — run them as dedicated services inside the cluster. Each gateway pod connects to the same memory service endpoint via internal DNS, so retrieval is consistent across replicas regardless of which pod handles a given session. The API proxy setup covers related patterns for managing shared backend connections.
High availability: Health checks, rolling updates and failover
Kubernetes handles most of this automatically once your deployment is configured correctly. The liveness and readiness probes restart unhealthy pods and remove them from rotation before they start receiving traffic. Rolling updates with maxUnavailable: 1 mean you always have at least two pods serving traffic during a deployment. The PDB ensures maintenance operations don't take down too many pods at once.
What Kubernetes doesn't handle automatically is channel-level failover. If you have a Telegram bot token configured, all three gateway replicas will have the same token, but Telegram only delivers messages to one active webhook endpoint at a time. You need to make sure your Ingress or load balancer is configured correctly so that Telegram's webhook hits a stable endpoint that routes to your cluster, not a specific pod IP. Same for Discord webhooks and WhatsApp callbacks.
The other thing to think about is credential refresh. WhatsApp sessions via QR code are tied to phone state and can expire. With multiple replicas, you need a way to refresh credentials that's not tied to a specific running pod. The management pattern here is to use a separate openclaw-cli job or pod for credential operations, writing results to the shared config volume where all gateway pods can read them.
For monitoring and alerting on your cluster setup, the monitoring guide covers Prometheus exporters and uptime checks that translate well to a Kubernetes context.
Backups in a containerized setup
Named volumes and PVCs are not a backup strategy. They protect against container restarts but not against storage corruption, accidental deletion, or cluster-level disasters. For Docker Compose setups, add a cron-based backup container that snapshots volumes to an S3-compatible bucket or remote storage. For Kubernetes, Velero is the standard tool for PV snapshots and cluster-state backups. The backup guide covers what needs to be backed up and at what frequency — the same principles apply whether you're running on a single VPS or across a K8s cluster.
Is this worth it for your setup?
To be direct: for most OpenClaw users, this architecture is overkill. A well-tuned single VPS with systemd, good monitoring, and regular backups handles the vast majority of real-world workloads without the operational complexity of Kubernetes. The right time to reach for containers and clustering is when you have concrete evidence of hitting limits — OOM kills, sustained CPU saturation, documented downtime that costs something — not as a preemptive measure.
If you're not there yet, the multi-agent setup guide is probably a more practical next step. Multiple agents coordinating on a single well-resourced instance covers a lot of the throughput use cases without the infrastructure overhead.

