Rate Limits
INFER applies rate limits to ensure fair usage and protect the network.
Default Limits
| Endpoint Category | Rate Limit | Window |
|---|---|---|
| Chat Completions | 60 requests | per minute |
| Models List | 30 requests | per minute |
| API Key Management | 20 requests | per minute |
| Authentication | 5 requests | per 15 minutes |
| Role Upgrade | 3 requests | per hour |
| Password Reset | 3 requests | per hour |
Rate Limit Headers
Every API response includes rate limit headers:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests in the current window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
When Rate Limited
If you exceed the rate limit, you’ll receive a 429 Too Many Requests response:
{
"error": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded. Try again in 30 seconds."
}
}The Retry-After header indicates how many seconds to wait.
Best Practices
- Implement exponential backoff for retries
- Cache responses when possible
- Use streaming for long completions to avoid timeouts
- Monitor your usage in the Dashboard