HTTP 429 from your LLM provider means you hit a rate limit. Either requests per minute, requests per day, tokens per minute or some combination. The error text varies by provider but the meaning is the same: you're going faster than your tier allows.
This is different from 402 (out of credits) and 401 (auth failure). 429 doesn't mean you owe money. It just means slow down. The fix is rarely "pay more"; usually it's "configure Hermes to back off properly" or "spread the load across more keys".
Let's begin!
What 429 looks like across providers
The Hermes log shows variants like these:
Error: HTTP 429: rate_limit_error(Anthropic)RateLimitError [HTTP 429](OpenAI client)429 Too Many Requestswith a Retry-After header (OpenRouter)⚠️ API call failed (attempt 1/3): RateLimitErrorin gateway logs
Most providers include a header telling you how long to wait. Some don't. Hermes handles the header-aware case automatically. The header-less case is where things get manual.
Hermes's built-in retry behaviour
Out of the box, Hermes retries 429s with exponential backoff. Default config:
- 3 retry attempts before failing the request
- Backoff doubles each time (1s, 2s, 4s)
- Respects the Retry-After header when present
This handles transient spikes fine. The provider rate-limited you for one request, two seconds later you're under the limit again, request succeeds on attempt 2. User barely notices.
The settings that control this:
hermes config get retry_max_attempts
hermes config get retry_backoff_base
hermes config get retry_respect_retry_after
For a busy gateway, I bump retries to 5 with a slightly longer backoff:
hermes config set retry_max_attempts 5
hermes config set retry_backoff_base 2.0
When the retry isn't enough
If you're 429ing on every request rather than occasional ones, you're not over a brief spike. You're persistently over the rate limit. Retries won't save you; they just delay the failure. Three real fixes.
Fix 1: Throttle Hermes at the agent level
Hermes can rate-limit itself before requests even hit the provider. Cleaner than letting the provider 429 you.
hermes config set rate_limit_rpm 60
hermes config set rate_limit_tpm 100000
Tune to just below your provider tier's limit. For Anthropic's Build tier 1, 50 RPM is safe. For the Build tier 2, 1000 RPM. Check your provider's docs.
When Hermes hits its self-throttle, requests queue locally instead of going to the provider. User sees latency, not errors. Better failure mode than spamming the provider and getting 429s back.
Fix 2: Credential pools
Add a second API key to the provider. Hermes rotates between them, effectively doubling your per-account limit. Full setup pattern in our credential pools tutorial.
This is what I run in production. Two Anthropic keys, round-robin. Rate-limit incidents dropped to near zero.
Fix 3: Fallback to a different provider
Once Hermes exhausts retries on the primary, fall through to a secondary provider with a different rate limit budget. Pattern in our Hermes 402 quota fallback tutorial. Same mechanism works for 429s if you add 429 to the fallback statuses:
hermes config set fallback_on_status "402,429,500,502,503,504"
Provider-specific 429 quirks
Anthropic
Has both per-minute and per-day limits, separate token and request limits. The 429 response tells you which one you hit. Token-per-minute limits are the most common bottleneck because Hermes pushes a lot of context per request.
If you're tpm-limited, the fix is reducing input tokens (skill pruning, /compress more often) more than throttling. Our cut Hermes token costs guide covers this.
OpenAI
Returns 429 for both rate-limit and out-of-credits, distinguished by header. Hermes handles both but if you've only configured 402 fallback, OpenAI out-of-credits won't fall through. Add 429 to your fallback list.
OpenAI also has weirdly aggressive limits on new accounts. First two weeks: lower rpm. After that, limits relax. New accounts hitting 429 immediately is normal; the fix is usually "wait two weeks" or "upgrade tier".
OpenRouter
The Retry-After header on OpenRouter 429s is usually short (seconds, not minutes). Hermes handles this well by default. If you're seeing persistent OpenRouter 429s, check if you're on the free tier; free models have aggressive caps that paid models don't.
Groq
Groq has the cleanest 429 behaviour (clear headers, predictable retry windows) and the most aggressive default limits. Their free tier 429s constantly. If you want Groq for cost reasons, expect to pay for a higher tier or run it as a fallback only.
Gemini
Per-day quotas are tight on the free tier. 429 from Gemini often means "you've burned today's allocation" rather than rate-limit. Falls through cleanly to a fallback provider in Hermes if configured.
Throttling per channel
If you have multiple messaging gateways and one of them generates most of the traffic, you might want to throttle that channel specifically. The pattern is to set a per-channel queue depth limit:
hermes gateway set telegram --max-pending-messages 10
Messages beyond the queue depth are dropped with a "system busy" reply to the user. Less elegant than provider-level smoothing but useful when a single chatty Telegram channel is single-handedly causing your 429s.
Logging which requests get 429'd
When debugging persistent rate-limit issues, you need to know which requests are hitting limits. Enable detailed provider logging:
hermes config set provider_log_level debug
hermes config set provider_log_path ~/.hermes/logs/provider.log
The log shows each provider call with its model, token count, response status and any 429 detail headers. Grep for "429" to find rate-limit events:
grep "429" ~/.hermes/logs/provider.log | tail -20
Pattern in the output usually shows which model and which time-of-day pattern is the culprit.
Why upgrading the tier isn't always the answer
From what I've seen, people reach for "upgrade tier" first. Sometimes that's right. Often it's not because your real bottleneck is bursty traffic, not sustained load....
If you spend most of the hour at 5 RPM and once a day spike to 80 RPM for 30 seconds, paying for a tier that supports 80 sustained RPM is wasteful. The throttle-and-queue fix (Fix 1 above) handles bursts at no extra cost. The credential pool fix doubles your burst capacity at zero extra hourly cost (just two key admin tasks).
Upgrade tier when your sustained load really requires it. Not before.
Monitoring rate-limit events
Set up a simple alert that fires when 429s spike. Cron job over the gateway log:
cat > /usr/local/bin/hermes-429-alert.sh << 'EOF'
#!/bin/bash
COUNT=$(journalctl -u hermes-gateway --since "1 hour ago" | grep -c "429")
THRESHOLD=5
if [ "$COUNT" -gt "$THRESHOLD" ]; then
echo "Hermes hit $COUNT 429s in the last hour" | mail -s "Hermes 429 alert" [email protected]
fi
EOF
chmod +x /usr/local/bin/hermes-429-alert.sh
Hourly cron. Threshold tuned to your normal traffic.
What 429 doesn't mean
If the error looks like a 429 but you're sure you're under the limits, double-check. Some providers return 429 when the IP itself is rate-limited (separate from your account) usually because of automated abuse from another tenant on shared infrastructure. There's nothing you can do from Hermes side; just retry later or use a different network.
Also check that the 429 isn't really a 401 in disguise (some providers do this; covered in our 401 auth errors piece).
Pre-tuned defaults on LumaDock
The Hermes Agent template on LumaDock ships with retry and throttle defaults that work for a small bot doing a few thousand requests per day. Plans include unmetered bandwidth and no setup fees so you're not paying twice when retries push request volume up. If your traffic outgrows the defaults, the credential pool and throttle config above is the standard upgrade path. Full setup walkthrough in our Hermes Agent complete guide.

