HTTP 402 from your LLM provider means you hit a billing or quota limit. Out of credits, exhausted daily token allowance, payment method declined. The painful part isn't the 402 itself. It's that older Hermes versions treated a single 402 as fatal and killed the gateway. Suddenly your Telegram bot is silent, your scheduled briefing didn't fire and the agent looks dead.
This guide covers three things: upgrading to a version that handles 402 cleanly, configuring fallback providers so a single quota issue doesn't take down the whole agent, and monitoring so you find out before users do.
What 402 looks like across providers
Provider-specific text varies:
- Anthropic:
HTTP 402: Your account is out of credits - OpenRouter:
402: insufficient_credits - MiniMax:
daily_limit_busywith 402 status - OpenAI: usually returns 429 not 402 (different status for out-of-credits, see below)
Step 1: Upgrade Hermes if your gateway dies on 402
Older Hermes versions propagated 402 up as a fatal gateway error. Newer versions retry and surface a clean error. If your gateway log shows a 402 followed by the gateway shutting down, you're on a version that needs upgrading:
hermes upgrade
hermes --version
Check the Hermes releases page for the current version. The 402 handling improvements landed in v0.13 or later.
Step 2: Why fallback providers matter
Even with a clean upgrade, a single provider's 402 means the agent stops answering until the quota resets. Monthly cap means a week of downtime. Daily cap means until midnight UTC. Either way the agent looks broken to users.
Fallback providers give Hermes a second route. When primary returns 402, gateway transparently retries through the secondary. Users see no failure. You get a log entry telling you the primary is exhausted, but the bot keeps working.
Step 3: Configure the fallback
Add a second provider
hermes provider list
hermes provider add openrouter --priority 2
hermes provider show
Priority 1 is the default, tried first. Priority 2 is fallback, only used when primary fails with a retriable error. You can chain priority 3 and 4 if you want belts and braces.
Pick a fallback model that mirrors the primary
If primary is Anthropic Sonnet, set the OpenRouter fallback to a Sonnet-equivalent model (Sonnet 4.6 is available on OpenRouter too). Same model class, different billing relationship, so an Anthropic credit exhaustion doesn't take you down.
Step 4: Tune which errors trigger fallback
Default fallback statuses are 402, 429, 5xx. You can change this:
hermes config set fallback_on_status "402,429,500,502,503,504"
hermes config set fallback_max_retries 2
Don't add 401 to the fallback list
If you do, real auth misconfigurations get covered up. You only find out the primary key is broken when both providers exhaust together. See our Hermes 401 auth errors piece for why 401 should always stay visible.
Credential pools for the same provider
If your usage justifies a second account at the same provider, Hermes can rotate between two keys associated with the same provider config:
hermes auth add anthropic --pool primary
hermes auth add anthropic --pool secondary
hermes provider set anthropic --credential-pool primary,secondary
This is rate-limit smoothing, not quota smoothing. Two accounts hitting the same monthly cap together still hit the cap. The real win is for short bursts where one account would 429 but a pool of two stays under the per-account RPM.
Monitoring to catch 402 early
The simplest pattern is to alert on the first 402 of a billing period. Every subsequent one is noise.
If you run the gateway under systemd (covered in our systemd setup), tail the gateway log into your monitoring stack:
journalctl -u hermes-gateway -f | grep --line-buffered "402" | mail -s "Hermes 402" [email protected]
Primitive but it works. If you have a real monitoring stack (Prometheus, Loki, anything with log alerting), wire the same rule.
Provider-specific 402 quirks
Anthropic
Returns 402 cleanly with a clear message. Fallback works as expected.
OpenAI
Returns 429 for both rate-limit and out-of-credits. Different special header tells you which. If you only configured 402 fallback, OpenAI exhaustion won't fall through. Add 429 to fallback statuses.
OpenRouter
Returns 402 with a clear topping-up message. Fallback works. The upgrade path is usually quicker than a fallback: just top up the OpenRouter account.
MiniMax and GLM
Sometimes return 401 for what is really a quota error. Hermes treats it as auth, fallback doesn't trigger. Workaround: mark these providers as fallback-only, never primary.
My production setup
For the production gateway running my Telegram and Discord channels:
- Primary: Anthropic Sonnet 4.6 with credit auto-recharge at the account level
- Fallback: OpenRouter routed to Sonnet 4.6 (same model, different billing)
- Alert: 402 line piped to email
Two real 402 events in the last six months, both caught by fallback, both fixed before anyone noticed.
For the local dev agent: no fallback. If Anthropic 402s during dev work, I want to see the error immediately so I can decide to top up or move to something else.
Cost tuning is the cheaper option
Fallback providers buy reliability. They don't reduce cost. The cheaper way to handle 402 is to not hit it in the first place. Set provider account budgets, tune token usage with the patterns in our token costs guide, use /compress aggressively in long sessions.
A well-tuned agent rarely 402s on a Sonnet account even with daily use.
Pre-upgraded gateway on LumaDock
The Hermes Agent template on LumaDock includes the upgraded gateway version that handles 402 cleanly out of the box, plus the same systemd unit that auto-restarts after transient errors. Unmetered bandwidth and no setup fees. Setup walkthrough in our Hermes Agent complete guide.

