Fix Hermes 402 quota errors with fallback providers

Ellie Grace Hayes

25/05/2026

Fix Hermes 402 quota errors with fallback providers - Fix Hermes 402 quota errors with fallback providers

HTTP 402 from your LLM provider means you hit a billing or quota limit. Out of credits, exhausted daily token allowance, payment method declined. The painful part isn't the 402 itself. It's that older Hermes versions treated a single 402 as fatal and killed the gateway. Suddenly your Telegram bot is silent, your scheduled briefing didn't fire and the agent looks dead.

This guide covers three things: upgrading to a version that handles 402 cleanly, configuring fallback providers so a single quota issue doesn't take down the whole agent, and monitoring so you find out before users do.

What 402 looks like across providers

Provider-specific text varies:

Anthropic: HTTP 402: Your account is out of credits
OpenRouter: 402: insufficient_credits
MiniMax: daily_limit_busy with 402 status
OpenAI: usually returns 429 not 402 (different status for out-of-credits, see below)

Step 1: Upgrade Hermes if your gateway dies on 402

Older Hermes versions propagated 402 up as a fatal gateway error. Newer versions retry and surface a clean error. If your gateway log shows a 402 followed by the gateway shutting down, you're on a version that needs upgrading:

hermes upgrade
hermes --version

Check the Hermes releases page for the current version. The 402 handling improvements landed in v0.13 or later.

Step 2: Why fallback providers matter

Even with a clean upgrade, a single provider's 402 means the agent stops answering until the quota resets. Monthly cap means a week of downtime. Daily cap means until midnight UTC. Either way the agent looks broken to users.

Fallback providers give Hermes a second route. When primary returns 402, gateway transparently retries through the secondary. Users see no failure. You get a log entry telling you the primary is exhausted, but the bot keeps working.

Step 3: Configure the fallback

Add a second provider

hermes provider list
hermes provider add openrouter --priority 2
hermes provider show

Priority 1 is the default, tried first. Priority 2 is fallback, only used when primary fails with a retriable error. You can chain priority 3 and 4 if you want belts and braces.

Pick a fallback model that mirrors the primary

If primary is Anthropic Sonnet, set the OpenRouter fallback to a Sonnet-equivalent model (Sonnet 4.6 is available on OpenRouter too). Same model class, different billing relationship, so an Anthropic credit exhaustion doesn't take you down.

Step 4: Tune which errors trigger fallback

Default fallback statuses are 402, 429, 5xx. You can change this:

hermes config set fallback_on_status "402,429,500,502,503,504"
hermes config set fallback_max_retries 2

Don't add 401 to the fallback list

If you do, real auth misconfigurations get covered up. You only find out the primary key is broken when both providers exhaust together. See our Hermes 401 auth errors piece for why 401 should always stay visible.

Credential pools for the same provider

If your usage justifies a second account at the same provider, Hermes can rotate between two keys associated with the same provider config:

hermes auth add anthropic --pool primary
hermes auth add anthropic --pool secondary
hermes provider set anthropic --credential-pool primary,secondary

This is rate-limit smoothing, not quota smoothing. Two accounts hitting the same monthly cap together still hit the cap. The real win is for short bursts where one account would 429 but a pool of two stays under the per-account RPM.

Monitoring to catch 402 early

The simplest pattern is to alert on the first 402 of a billing period. Every subsequent one is noise.

If you run the gateway under systemd (covered in our systemd setup), tail the gateway log into your monitoring stack:

journalctl -u hermes-gateway -f | grep --line-buffered "402" | mail -s "Hermes 402" [email protected]

Primitive but it works. If you have a real monitoring stack (Prometheus, Loki, anything with log alerting), wire the same rule.

Provider-specific 402 quirks

Anthropic

Returns 402 cleanly with a clear message. Fallback works as expected.

OpenAI

Returns 429 for both rate-limit and out-of-credits. Different special header tells you which. If you only configured 402 fallback, OpenAI exhaustion won't fall through. Add 429 to fallback statuses.

OpenRouter

Returns 402 with a clear topping-up message. Fallback works. The upgrade path is usually quicker than a fallback: just top up the OpenRouter account.

MiniMax and GLM

Sometimes return 401 for what is really a quota error. Hermes treats it as auth, fallback doesn't trigger. Workaround: mark these providers as fallback-only, never primary.

My production setup

For the production gateway running my Telegram and Discord channels:

Primary: Anthropic Sonnet 4.6 with credit auto-recharge at the account level
Fallback: OpenRouter routed to Sonnet 4.6 (same model, different billing)
Alert: 402 line piped to email

Two real 402 events in the last six months, both caught by fallback, both fixed before anyone noticed.

For the local dev agent: no fallback. If Anthropic 402s during dev work, I want to see the error immediately so I can decide to top up or move to something else.

Cost tuning is the cheaper option

Fallback providers buy reliability. They don't reduce cost. The cheaper way to handle 402 is to not hit it in the first place. Set provider account budgets, tune token usage with the patterns in our token costs guide, use /compress aggressively in long sessions.

A well-tuned agent rarely 402s on a Sonnet account even with daily use.

Pre-upgraded gateway on LumaDock

The Hermes Agent template on LumaDock includes the upgraded gateway version that handles 402 cleanly out of the box, plus the same systemd unit that auto-restarts after transient errors. Unmetered bandwidth and no setup fees. Setup walkthrough in our Hermes Agent complete guide.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

Billing Cycle

VPS.S1

$5.99 Save 17 %

$4.99 Monthly

2 vCPU AMD EPYC
2 GB RAMMEMORY
30 GB NVMeSTORAGE
Unmetered bandwidth
IPv4 & IPv6IPv6 is currently unavailable in France, Finland or the Netherlands. included

Fix Hermes 402 quota errors with fallback providers

What 402 looks like across providers

Step 1: Upgrade Hermes if your gateway dies on 402

Step 2: Why fallback providers matter

Step 3: Configure the fallback

Add a second provider

Pick a fallback model that mirrors the primary

Step 4: Tune which errors trigger fallback

Don't add 401 to the fallback list

Credential pools for the same provider

Monitoring to catch 402 early

Provider-specific 402 quirks

Anthropic

OpenAI

OpenRouter

MiniMax and GLM

My production setup

Cost tuning is the cheaper option

Pre-upgraded gateway on LumaDock

Your idea deserves better hosting

VPS.S1

VPS.S2

VPS.S3

EPYC VPS.P1

EPYC VPS.P2

EPYC VPS.P3

EPYC VPS.P4

EPYC VPS.P5

EPYC VPS.P6

EPYC VPS.P7

Genoa VPS.G2

Genoa VPS.G3

Genoa VPS.G4

Genoa VPS.G6

Genoa VPS.G7

AMD Ryzen VPS.R1

AMD Ryzen VPS.R2

AMD Ryzen VPS.R3

AMD Ryzen VPS.R4

Extra answers

How do I stop a 402 from crashing the Hermes gateway?

Should I add 401 to the fallback statuses too?

What's a Hermes credential pool?

Why does OpenAI 429 instead of 402 when out of credits?

Your agent runs wild. Your bill doesn't.

Products

App hosting solutions

Resources

Company

Features

Get help

Solutions by use case

Generate Password