High availability for n8n: editor, workers and datastore

Learn how to build high availability for n8n. Covering editor redundancy, Redis queue workers, and datastore resilience.

Mark O'Connor
September 29, 2025

Running n8n on a single VPS is fine when you’re experimenting or automating personal tasks. But what happens when your business depends on workflows staying online 24/7? Downtime, failed webhooks, or a crashed database can mean lost revenue or broken customer experiences. That’s where HA (high availability) comes in.

In this article, I’ll walk you through what high availability means for n8n, how to architect it across the editor, workers, and datastore, and what trade-offs to expect. I’ll also sprinkle in some lessons learned…. because yes, I’ve broken production setups before and learned the hard way.

What does high availability mean in practice?

At its core, HA is about making sure there is no single point of failure in your n8n deployment. If one piece fails, another takes over. It’s not just about uptime percentages (though five nines look great on marketing slides). It’s about resilience, graceful degradation, and recovery.

In the context of n8n, we usually break HA down into three components:

The editor (the UI where you build and manage workflows)
The workers (the processes that execute workflows in queue mode)
The datastore (Postgres and Redis, which hold your workflows, history, and job queues)

Each of these needs its own HA strategy.

Making the n8n editor redundant

The editor is stateful in the sense that it talks to the database and Redis, but the editor nodes themselves don’t store data. That makes them relatively easy to scale.

Multiple editor instances: Run two or more editor containers behind a load balancer. This way, if one crashes, traffic is routed to the others.
Stateless scaling: Because editors don’t store local state, you can kill and restart them without data loss, as long as the datastore is healthy.
Session persistence: If you want smoother user experience, configure sticky sessions on the load balancer so you don’t bounce between editors mid-workflow edit.

Real-world tip

I once ran an editor cluster with no sticky sessions and kept losing unsaved changes when switching nodes. Lesson learned: set up session affinity if you’re actively editing workflows in production.

Worker resilience with Redis queue mode

If you’re running anything serious, you should already be in queue mode with Redis. This separates webhook handling from workflow execution and allows multiple workers to process jobs.

Horizontal scaling: Run multiple worker containers. Redis acts as the job broker, and workers pick tasks from the queue.
Graceful restarts: Workers can be restarted one by one (draining current jobs) without losing executions.
Canary workers: Add new workers with a different image tag for testing upgrades before rolling them out to all nodes.

Failure scenarios to plan for

A worker crashes mid-execution → Redis requeues the job for another worker.
A whole node goes offline → Other workers take over, assuming Redis is reachable.
Redis fails → This is the real weak spot, so Redis must also be made highly available.

Keeping the datastore alive: Postgres and Redis

Your datastore is the backbone of n8n. Lose it, and your whole automation setup halts.

Postgres

Primary-replica setup: Run Postgres in streaming replication with automatic failover. Tools like Patroni or cloud-managed Postgres make this easier.
Backups and PITR: HA isn’t just about availability, it’s also about recovery. Use point-in-time recovery (PITR) backups so you can roll back if corruption happens.
Connection pooling: Add PgBouncer or Pgpool-II to manage connections across multiple editors and workers.

Redis

Redis Sentinel or Cluster: Sentinel provides automated failover; Cluster adds sharding for scale. For n8n, Sentinel is often enough.
Persistence: Enable AOF (append-only file) so Redis can rebuild state after a crash.
Private networking: Keep Redis internal to your VPS provider’s network. Never expose it directly to the internet.

See also: n8n Redis scaling guide for more details.

Networking and load balancing

An HA setup is only as good as the glue holding it together.

Load balancer: Use HAProxy, Nginx, or your provider’s LB service to distribute traffic to editor nodes.
Private networking: Keep Postgres and Redis off the public internet (see our private networking guide).
Failover DNS: Services like Cloudflare or Route 53 can reroute traffic if one region or VPS fails.

Monitoring and alerting for HA

You can’t call a system “highly available” if you don’t know when it fails.

Monitor webhooks with uptime checks (Uptime Kuma works well).
Track worker queue depth and execution latency with Prometheus and Grafana (guide here).
Alert on Postgres replication lag and Redis Sentinel failovers.

When something breaks at 3 a.m., alerts are the only reason you’ll know before your customers do.

FAQ

Can I achieve HA on a single VPS?

Not really. You can harden a single server, but HA requires multiple nodes to avoid single points of failure.

Do I need Kubernetes for this?

No. You can achieve HA with Docker Compose on multiple VPS instances and a load balancer. Kubernetes helps at larger scale, but it’s not mandatory.

What’s the minimum setup for HA?

At least two editor nodes, two workers, a Postgres replica, and Redis with Sentinel. Plus a load balancer in front.

How expensive is HA for n8n?

It’s more costly than a single VPS, but you can start with small nodes and scale up. A few Starter VPS plans are enough for a minimal HA cluster.

Building resilient automation with n8n

High availability isn’t just for banks and mega-corporations. If you’re running customer-facing workflows or internal systems that can’t afford downtime, designing HA into your n8n deployment is worth the effort. It takes planning, more infrastructure, and discipline around monitoring, but the payoff is confidence: knowing that a single crash won’t take down your automation.

The bottom line? Treat your n8n setup like any other production system. Add redundancy, test failovers, and build a recovery plan. Then you’ll have automation you can truly rely on.