Fix Hermes Agent 429 rate limit errors and tracebacks

Alex

06/05/2026

Fix Hermes Agent 429 rate limit errors and tracebacks

HTTP 429 from your LLM provider means you hit a rate limit. Either requests per minute, requests per day, tokens per minute or some combination. The error text varies by provider but the meaning is the same: you're going faster than your tier allows.

This is different from 402 (out of credits) and 401 (auth failure). 429 doesn't mean you owe money. It just means slow down. The fix is rarely "pay more"; usually it's "configure Hermes to back off properly" or "spread the load across more keys".

Let's begin!

What 429 looks like across providers

The Hermes log shows variants like these:

Error: HTTP 429: rate_limit_error (Anthropic)
RateLimitError [HTTP 429] (OpenAI client)
429 Too Many Requests with a Retry-After header (OpenRouter)
⚠️ API call failed (attempt 1/3): RateLimitError in gateway logs

Most providers include a header telling you how long to wait. Some don't. Hermes handles the header-aware case automatically. The header-less case is where things get manual.

Hermes's built-in retry behaviour

Out of the box, Hermes retries 429s with exponential backoff. Default config:

3 retry attempts before failing the request
Backoff doubles each time (1s, 2s, 4s)
Respects the Retry-After header when present

This handles transient spikes fine. The provider rate-limited you for one request, two seconds later you're under the limit again, request succeeds on attempt 2. User barely notices.

The settings that control this:

hermes config get retry_max_attempts
hermes config get retry_backoff_base
hermes config get retry_respect_retry_after

For a busy gateway, I bump retries to 5 with a slightly longer backoff:

hermes config set retry_max_attempts 5
hermes config set retry_backoff_base 2.0

When the retry isn't enough

If you're 429ing on every request rather than occasional ones, you're not over a brief spike. You're persistently over the rate limit. Retries won't save you; they just delay the failure. Three real fixes.

Fix 1: Throttle Hermes at the agent level

Hermes can rate-limit itself before requests even hit the provider. Cleaner than letting the provider 429 you.

hermes config set rate_limit_rpm 60
hermes config set rate_limit_tpm 100000

Tune to just below your provider tier's limit. For Anthropic's Build tier 1, 50 RPM is safe. For the Build tier 2, 1000 RPM. Check your provider's docs.

When Hermes hits its self-throttle, requests queue locally instead of going to the provider. User sees latency, not errors. Better failure mode than spamming the provider and getting 429s back.

Fix 2: Credential pools

Add a second API key to the provider. Hermes rotates between them, effectively doubling your per-account limit. Full setup pattern in our credential pools tutorial.

This is what I run in production. Two Anthropic keys, round-robin. Rate-limit incidents dropped to near zero.

Fix 3: Fallback to a different provider

Once Hermes exhausts retries on the primary, fall through to a secondary provider with a different rate limit budget. Pattern in our Hermes 402 quota fallback tutorial. Same mechanism works for 429s if you add 429 to the fallback statuses:

hermes config set fallback_on_status "402,429,500,502,503,504"

Provider-specific 429 quirks

Anthropic

Has both per-minute and per-day limits, separate token and request limits. The 429 response tells you which one you hit. Token-per-minute limits are the most common bottleneck because Hermes pushes a lot of context per request.

If you're tpm-limited, the fix is reducing input tokens (skill pruning, /compress more often) more than throttling. Our cut Hermes token costs guide covers this.

OpenAI

Returns 429 for both rate-limit and out-of-credits, distinguished by header. Hermes handles both but if you've only configured 402 fallback, OpenAI out-of-credits won't fall through. Add 429 to your fallback list.

OpenAI also has weirdly aggressive limits on new accounts. First two weeks: lower rpm. After that, limits relax. New accounts hitting 429 immediately is normal; the fix is usually "wait two weeks" or "upgrade tier".

OpenRouter

The Retry-After header on OpenRouter 429s is usually short (seconds, not minutes). Hermes handles this well by default. If you're seeing persistent OpenRouter 429s, check if you're on the free tier; free models have aggressive caps that paid models don't.

Groq

Groq has the cleanest 429 behaviour (clear headers, predictable retry windows) and the most aggressive default limits. Their free tier 429s constantly. If you want Groq for cost reasons, expect to pay for a higher tier or run it as a fallback only.

Gemini

Per-day quotas are tight on the free tier. 429 from Gemini often means "you've burned today's allocation" rather than rate-limit. Falls through cleanly to a fallback provider in Hermes if configured.

Throttling per channel

If you have multiple messaging gateways and one of them generates most of the traffic, you might want to throttle that channel specifically. The pattern is to set a per-channel queue depth limit:

hermes gateway set telegram --max-pending-messages 10

Messages beyond the queue depth are dropped with a "system busy" reply to the user. Less elegant than provider-level smoothing but useful when a single chatty Telegram channel is single-handedly causing your 429s.

Logging which requests get 429'd

When debugging persistent rate-limit issues, you need to know which requests are hitting limits. Enable detailed provider logging:

hermes config set provider_log_level debug
hermes config set provider_log_path ~/.hermes/logs/provider.log

The log shows each provider call with its model, token count, response status and any 429 detail headers. Grep for "429" to find rate-limit events:

grep "429" ~/.hermes/logs/provider.log | tail -20

Pattern in the output usually shows which model and which time-of-day pattern is the culprit.

Why upgrading the tier isn't always the answer

From what I've seen, people reach for "upgrade tier" first. Sometimes that's right. Often it's not because your real bottleneck is bursty traffic, not sustained load....

If you spend most of the hour at 5 RPM and once a day spike to 80 RPM for 30 seconds, paying for a tier that supports 80 sustained RPM is wasteful. The throttle-and-queue fix (Fix 1 above) handles bursts at no extra cost. The credential pool fix doubles your burst capacity at zero extra hourly cost (just two key admin tasks).

Upgrade tier when your sustained load really requires it. Not before.

Monitoring rate-limit events

Set up a simple alert that fires when 429s spike. Cron job over the gateway log:

cat > /usr/local/bin/hermes-429-alert.sh << 'EOF'
#!/bin/bash
COUNT=$(journalctl -u hermes-gateway --since "1 hour ago" | grep -c "429")
THRESHOLD=5
if [ "$COUNT" -gt "$THRESHOLD" ]; then
  echo "Hermes hit $COUNT 429s in the last hour" | mail -s "Hermes 429 alert" [email protected]
fi
EOF
chmod +x /usr/local/bin/hermes-429-alert.sh

Hourly cron. Threshold tuned to your normal traffic.

What 429 doesn't mean

If the error looks like a 429 but you're sure you're under the limits, double-check. Some providers return 429 when the IP itself is rate-limited (separate from your account) usually because of automated abuse from another tenant on shared infrastructure. There's nothing you can do from Hermes side; just retry later or use a different network.

Also check that the 429 isn't really a 401 in disguise (some providers do this; covered in our 401 auth errors piece).

Pre-tuned defaults on LumaDock

The Hermes Agent template on LumaDock ships with retry and throttle defaults that work for a small bot doing a few thousand requests per day. Plans include unmetered bandwidth and no setup fees so you're not paying twice when retries push request volume up. If your traffic outgrows the defaults, the credential pool and throttle config above is the standard upgrade path. Full setup walkthrough in our Hermes Agent complete guide.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

Abonament

VPS.S1

$5.99 Save 17 %

$4.99 Lunar

2 vCPU AMD EPYC
2 GB RAMMEMORIE
30 GB NVMeSTOCARE
Trafic nelimitat
IPv4 & IPv6Suportul IPv6 este momentan indisponibil în Franța, Finlanda sau Țările de Jos. incluse

Fix Hermes Agent 429 rate limit errors and tracebacks

What 429 looks like across providers

Hermes's built-in retry behaviour

When the retry isn't enough

Fix 1: Throttle Hermes at the agent level

Fix 2: Credential pools

Fix 3: Fallback to a different provider

Provider-specific 429 quirks

Anthropic

OpenAI

OpenRouter

Groq

Gemini

Throttling per channel

Logging which requests get 429'd

Why upgrading the tier isn't always the answer

Monitoring rate-limit events

What 429 doesn't mean

Pre-tuned defaults on LumaDock

Your idea deserves better hosting

VPS.S1

VPS.S2

VPS.S3

EPYC VPS.P1

EPYC VPS.P2

EPYC VPS.P3

EPYC VPS.P4

EPYC VPS.P5

EPYC VPS.P6

EPYC VPS.P7

Genoa VPS.G2

Genoa VPS.G3

Genoa VPS.G4

Genoa VPS.G6

Genoa VPS.G7

AMD Ryzen VPS.R1

AMD Ryzen VPS.R2

AMD Ryzen VPS.R3

AMD Ryzen VPS.R4

Frequently asked questions

What's the difference between Hermes Agent 429 and 402 errors?

How do I throttle Hermes Agent before it hits provider rate limits?

Should I upgrade my provider tier or add a credential pool first?

Why does Hermes Agent get 429s right after I created a new OpenAI account?

How do I know which Hermes requests are hitting the 429?

Your agent runs wild. Your bill doesn't.

Produse

Găzduire aplicații

Resurse

Companie

Funcționalități

Ajutor

Soluții

Generare Parolă