Rate Limiting¶
Agent Gateway supports request rate limiting via slowapi, an optional dependency.
Installation¶
Configuration¶
Via gateway.yaml¶
Via Python API¶
from agent_gateway import Gateway
gw = Gateway(workspace="workspace/")
gw.use_rate_limit(default_limit="50/minute")
Both approaches are equivalent. The use_rate_limit() method takes precedence over gateway.yaml if both are set.
Rate limit format¶
Rate limits use slowapi's string format:
| Example | Meaning |
|---|---|
"10/second" |
10 requests per second |
"100/minute" |
100 requests per minute |
"1000/hour" |
1000 requests per hour |
"10000/day" |
10000 requests per day |
Multi-worker deployments¶
By default, rate limit counters are stored in memory. With multiple workers (server.workers > 1), each worker tracks limits independently — a client could effectively get N x limit requests through.
To enforce limits across workers, point storage_uri at a Redis instance:
Gateway logs a warning at startup if multiple workers are configured without a storage_uri.
Reverse proxy deployments¶
When running behind a reverse proxy (nginx, AWS ALB, etc.), client IPs appear as the proxy address. Enable trust_forwarded_for to read the real client IP from the X-Forwarded-For header:
Warning
Only enable trust_forwarded_for when you trust the proxy setting the header. Untrusted clients can spoof this header to bypass rate limits.
Response headers¶
When rate limiting is enabled, responses include standard rate limit headers:
X-RateLimit-Limit— the configured limitX-RateLimit-Remaining— requests remaining in the current windowX-RateLimit-Reset— seconds until the window resets
When a client exceeds the limit, they receive a 429 Too Many Requests response: