Running OpenClaw at scale: Docker, Kubernetes and HA setup

Alexandru Stan

24/02/2026

Running OpenClaw at scale: Docker, Kubernetes and HA setup - Running OpenClaw at scale: Docker, Kubernetes and HA setup

Running OpenClaw on a single VPS with systemd covers a lot of ground. It works fine for personal use, a handful of channels, and light automation. But production workloads hit the ceiling faster than you'd expect: one gateway handling concurrent sessions, heavy cron loads, large memory indexing, and multiple channel adapters at the same time eventually runs into CPU, RAM, or reliability limits. When that happens, you have two choices.... throw more hardware at it, or rethink the architecture.

This article covers the second option: containerizing OpenClaw with Docker, deploying it to Kubernetes for true high availability, and managing the stateful storage that makes multi-instance setups actually work. It builds on what's covered in hosting OpenClaw securely on a VPS and the systemd service setup, so the assumption is you already have a working single-instance deployment and you're looking to go further.

When does scaling make sense?

Not every OpenClaw setup needs this. Most personal deployments and small team setups run fine on a single server, and adding orchestration complexity without a real reason to is just adding maintenance burden. Here are the signals that it's time:

CPU spikes above 80% during concurrent sessions. If your gateway is pegging the CPU whenever two or three sessions are active simultaneously, a single instance won't grow into the load gracefully.
OOM kills in system logs. Large tool chains, memory indexing with QMD or Cognee, and multi-agent routing can push RAM consumption high. If the OS is killing your gateway process, you're either undersized or need to distribute the load.
More than 10 simultaneous sessions, or heavy cron/heartbeat concurrency. One gateway handles session isolation via its internal lane system, but there are limits to what a single process can manage without degrading response times.
Zero-downtime requirements. Updating a single-instance systemd service means a brief outage. If your channels need to stay up continuously, you need replicas and rolling deployments.
Processing high-volume webhooks or API integrations. If you're routing hundreds of inbound events per hour, a single gateway becomes a bottleneck.

If none of these apply, the VPS monitoring guide is probably a better next read than this one.

Containerizing OpenClaw with Docker

Writing a production Dockerfile

OpenClaw has official images (openclaw/openclaw:latest), but if you want control over what's in the image — specific apt packages, a non-root user, custom build steps — you'll want your own Dockerfile. A production-ready starting point based on Node 22 slim:

FROM node:22-bookworm-slim

RUN corepack enable && \
    curl -fsSL https://bun.sh/install | bash && \
    mv /root/.bun/bin/bun /usr/local/bin/

WORKDIR /app

# Cache dependencies before copying source (faster rebuilds)
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml .npmrc ./
COPY ui/package.json ./ui/
RUN pnpm install --frozen-lockfile

COPY . .
RUN pnpm build && pnpm ui:build

ENV NODE_ENV=production
EXPOSE 18789 18793

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node dist/index.js health || exit 1

CMD ["node", "dist/index.js"]

A few things worth adjusting for your specific setup:

If you need ffmpeg or native build tools for certain skills, add RUN apt-get update && apt-get install -y ffmpeg build-essential before the COPY steps so it gets cached separately.
Switch to a non-root user with USER node before the CMD line and make sure your volume mount points are chowned accordingly. Running as root in a container isn't the end of the world on a private VPS, but it's not good practice when you have channels connected to external services.
Alpine-based images are smaller but sometimes cause issues with native Node modules. Slim bookworm is the safer choice for a production gateway.

Docker Compose for a multi-service setup

In most deployments you'll want at least three services: the gateway itself, a management container for running admin commands, and a reverse proxy. Here's a compose file that covers that:

version: '3.8'

services:
  openclaw-gateway:
    image: openclaw/openclaw:latest
    container_name: openclaw-gateway
    restart: unless-stopped
    ports:
      - "18789:18789"
      - "18793:18793"
    volumes:
      - openclaw-config:/root/.openclaw
      - openclaw-workspace:/root/workspace
    environment:
      - NODE_ENV=production
      - OPENCLAW_STATE_DIR=/root/.openclaw
    command: openclaw gateway
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
    healthcheck:
      test: ["CMD", "node", "dist/index.js", "health"]
      interval: 30s
      timeout: 10s
      retries: 3

  openclaw-cli:
    image: openclaw/openclaw:latest
    volumes_from: [openclaw-gateway]
    entrypoint: openclaw

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - certs:/etc/nginx/certs
    depends_on:
      - openclaw-gateway

volumes:
  openclaw-config:
  openclaw-workspace:
  certs:

The two named volumes are the critical piece here. openclaw-config holds everything in ~/.openclaw: your config, credentials, sessions, and cron/jobs.json. openclaw-workspace holds MEMORY.md, the memory/ directory, tools, and skills. Both need to persist across container restarts — if you use ephemeral storage, you lose sessions and memory on every redeploy.

The openclaw-cli service shares volumes with the gateway and gives you a clean way to run management commands without exec-ing into the running gateway:

docker compose run --rm openclaw-cli channels login
docker compose run --rm openclaw-cli status --all
docker compose exec openclaw-gateway openclaw status

For channel setup tasks that require a TTY (WhatsApp QR scanning, Telegram login), use docker compose run -it --rm openclaw-cli channels login.

Agent sandboxing

OpenClaw's sandbox mode runs tools and sessions in isolated sub-containers coordinated by the host gateway. You can configure this with agents.defaults.sandbox.mode: "non-main" (sandboxes everything except the main agent) or "all". The sandbox containers mount a /workspace directory and by default run with network: none, with opt-in egress for tools that need external access. Idle sandboxes are pruned automatically after 24 hours, and aged-out ones after 7 days. This is worth enabling in production because it limits blast radius if a tool chain behaves unexpectedly.

Deploying to Kubernetes

Docker Compose is fine for a single host. Kubernetes is what you want when you need multiple replicas, automatic failover, rolling deployments, and horizontal autoscaling. The trade-off is real: Kubernetes has genuine operational complexity and isn't worth it for setups that don't need what it provides. If you're running OpenClaw on a LumaDock VPS and a single well-resourced instance is enough, stay there. If you're running a production deployment with uptime SLAs, read on.

Secrets and namespace

Start with a namespace to keep OpenClaw resources isolated, and put all sensitive values in a Secret rather than ConfigMaps or environment variables in manifests:

apiVersion: v1
kind: Namespace
metadata:
  name: openclaw

---
apiVersion: v1
kind: Secret
metadata:
  name: openclaw-secrets
  namespace: openclaw
type: Opaque
data:
  OPENCLAW_TELEGRAM_TOKEN: <base64-encoded>
  ANTHROPIC_API_KEY: <base64-encoded>
  DISCORD_BOT_TOKEN: <base64-encoded>

Generate base64 values with echo -n 'your-value' | base64. In a real production cluster, use something like External Secrets Operator to pull from a proper secret store rather than baking values directly into manifests.

Deployment manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openclaw-gateway
  namespace: openclaw
spec:
  replicas: 3
  selector:
    matchLabels:
      app: openclaw-gateway
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      labels:
        app: openclaw-gateway
    spec:
      containers:
      - name: gateway
        image: openclaw/openclaw:latest
        ports:
        - containerPort: 18789
        envFrom:
        - secretRef:
            name: openclaw-secrets
        volumeMounts:
        - name: config
          mountPath: /root/.openclaw
        - name: workspace
          mountPath: /root/workspace
        resources:
          limits:
            cpu: "2"
            memory: "2Gi"
          requests:
            cpu: "1"
            memory: "1Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 18789
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health
            port: 18789
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config
        persistentVolumeClaim:
          claimName: openclaw-config-pvc
      - name: workspace
        persistentVolumeClaim:
          claimName: openclaw-workspace-pvc

Three replicas is a reasonable starting point for HA: it tolerates one pod going down (for maintenance or a crash) while still keeping two active. The liveness probe restarts the container if the gateway stops responding; the readiness probe removes a pod from the load balancer rotation until it's actually ready to serve traffic.

Persistent volumes

This is where multi-replica OpenClaw deployments get interesting. The config and workspace volumes need ReadWriteMany access mode so multiple pods can mount them simultaneously. Not all storage providers support RWX — local-path provisioner doesn't, AWS EBS doesn't, but NFS, CephFS, and Longhorn do. If your cluster doesn't have an RWX-capable storage class, NFS is the easiest option to add:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: openclaw-config-pvc
  namespace: openclaw
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs-client
  resources:
    requests:
      storage: 10Gi

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: openclaw-workspace-pvc
  namespace: openclaw
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs-client
  resources:
    requests:
      storage: 20Gi

If you're running advanced memory backends (QMD, Cognee, Mem0) as discussed in the advanced memory management guide, those services should each get their own PVCs and ideally run as separate deployments rather than being bundled with the gateway. The gateways then connect to them via internal cluster DNS, which keeps storage concerns separated and makes individual scaling cleaner.

Service and Ingress

apiVersion: v1
kind: Service
metadata:
  name: openclaw-service
  namespace: openclaw
spec:
  selector:
    app: openclaw-gateway
  ports:
  - name: gateway
    port: 18789
    targetPort: 18789
  type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: openclaw-ingress
  namespace: openclaw
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
  rules:
  - host: openclaw.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: openclaw-service
            port:
              number: 18789

The extended proxy timeouts matter because the gateway uses WebSockets for some channel communication, and default nginx timeouts of 60 seconds will drop those connections. Set them to at least 3600 seconds (one hour).

Horizontal Pod Autoscaler

Once you have resource requests and limits defined in your deployment, HPA can scale replicas up and down based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: openclaw-hpa
  namespace: openclaw
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: openclaw-gateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

70% CPU utilization as the scale-up trigger is conservative, which is intentional — you want to add capacity before you're already saturated, not after. The minimum of 2 replicas ensures you always have a fallback if one pod dies.

Pod Disruption Budget

A PDB prevents Kubernetes from taking down too many replicas at once during node maintenance or cluster upgrades:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: openclaw-pdb
  namespace: openclaw
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: openclaw-gateway

With minAvailable: 2 and three replicas, Kubernetes will never voluntarily take down more than one pod at a time, which means your deployment stays available throughout cluster maintenance windows.

Managing state across replicas

State is the hardest part of running multiple OpenClaw gateway instances. The gateway is not stateless — it maintains sessions, cron job state in jobs.json, channel credentials, and memory. When you run three replicas all writing to the same shared NFS volume, you need to think about what happens when two of them try to write the same file simultaneously.

The honest answer is that OpenClaw's current architecture is designed around a single-gateway model. Shared storage via RWX PVs works reasonably well for config and credentials (which are mostly read, rarely written) and for workspace memory files (which are written sequentially by individual sessions). The area that needs more care is cron/jobs.json: with multiple gateways running, you can get duplicate cron job executions if all replicas are watching the same jobs file and each independently schedules runs.

The practical mitigation for now is one of these two approaches. Designate one replica as the cron leader — configure only one pod to have cron enabled (OPENCLAW_SKIP_CRON=0) and set the others to skip it (OPENCLAW_SKIP_CRON=1). This is simpler than it sounds: use a separate Kubernetes Deployment for the cron-leader pod with that env var set differently. The other replicas handle channel traffic and sessions while one handles scheduling. Alternatively, use a Redis sidecar or external lock service for distributed cron coordination, which is more complex but cleaner at higher scales.

For memory backends like QMD, Cognee, and Mem0 — run them as dedicated services inside the cluster. Each gateway pod connects to the same memory service endpoint via internal DNS, so retrieval is consistent across replicas regardless of which pod handles a given session. The API proxy setup covers related patterns for managing shared backend connections.

High availability: Health checks, rolling updates and failover

Kubernetes handles most of this automatically once your deployment is configured correctly. The liveness and readiness probes restart unhealthy pods and remove them from rotation before they start receiving traffic. Rolling updates with maxUnavailable: 1 mean you always have at least two pods serving traffic during a deployment. The PDB ensures maintenance operations don't take down too many pods at once.

What Kubernetes doesn't handle automatically is channel-level failover. If you have a Telegram bot token configured, all three gateway replicas will have the same token, but Telegram only delivers messages to one active webhook endpoint at a time. You need to make sure your Ingress or load balancer is configured correctly so that Telegram's webhook hits a stable endpoint that routes to your cluster, not a specific pod IP. Same for Discord webhooks and WhatsApp callbacks.

The other thing to think about is credential refresh. WhatsApp sessions via QR code are tied to phone state and can expire. With multiple replicas, you need a way to refresh credentials that's not tied to a specific running pod. The management pattern here is to use a separate openclaw-cli job or pod for credential operations, writing results to the shared config volume where all gateway pods can read them.

For monitoring and alerting on your cluster setup, the monitoring guide covers Prometheus exporters and uptime checks that translate well to a Kubernetes context.

Backups in a containerized setup

Named volumes and PVCs are not a backup strategy. They protect against container restarts but not against storage corruption, accidental deletion, or cluster-level disasters. For Docker Compose setups, add a cron-based backup container that snapshots volumes to an S3-compatible bucket or remote storage. For Kubernetes, Velero is the standard tool for PV snapshots and cluster-state backups. The backup guide covers what needs to be backed up and at what frequency — the same principles apply whether you're running on a single VPS or across a K8s cluster.

Is this worth it for your setup?

To be direct: for most OpenClaw users, this architecture is overkill. A well-tuned single VPS with systemd, good monitoring, and regular backups handles the vast majority of real-world workloads without the operational complexity of Kubernetes. The right time to reach for containers and clustering is when you have concrete evidence of hitting limits — OOM kills, sustained CPU saturation, documented downtime that costs something — not as a preemptive measure.

If you're not there yet, the multi-agent setup guide is probably a more practical next step. Multiple agents coordinating on a single well-resourced instance covers a lot of the throughput use cases without the infrastructure overhead.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

Billing Cycle

1 GB RAM VPS

$3.99 Save 50 %

$1.99 Monthly

1 vCPU AMD EPYC
30 GB NVMe storage
✔Unmetered bandwidth
✔ IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
✔1 Gbps network
✔Firewall management
✔Free server monitoring

Running OpenClaw at scale: Docker, Kubernetes and HA setup

When does scaling make sense?

Containerizing OpenClaw with Docker

Writing a production Dockerfile

Docker Compose for a multi-service setup

Agent sandboxing

Deploying to Kubernetes

Secrets and namespace

Deployment manifest

Persistent volumes

Service and Ingress

Horizontal Pod Autoscaler

Pod Disruption Budget

Managing state across replicas

High availability: Health checks, rolling updates and failover

Backups in a containerized setup

Is this worth it for your setup?

Your idea deserves better hosting

1 GB RAM VPS

2 GB RAM VPS

4 GB RAM VPS

6 GB RAM VPS

AMD EPYC VPS.P1

AMD EPYC VPS.P2

AMD EPYC VPS.P3

AMD EPYC VPS.P4

AMD EPYC VPS.P5

AMD EPYC VPS.P6

AMD EPYC VPS.P7

EPYC Genoa VPS.G1

EPYC Genoa VPS.G2

EPYC Genoa VPS.G3

EPYC Genoa VPS.G4

EPYC Genoa VPS.G5

EPYC Genoa VPS.G6

EPYC Genoa VPS.G7

FAQ

Can I run multiple OpenClaw gateways without Kubernetes?

Do I need ReadWriteMany storage or can I use standard block storage?

How do I handle WhatsApp QR code login in a containerized setup?

Will cron jobs run multiple times with multiple replicas?

What's the minimum cluster size for HA?

Automate faster, for less

Products

App hosting solutions

Features

Resources

Solutions by use case

Get help

Company

Generate Password