Skip to Content
API ReferenceRate Limits

Rate Limits

INFER applies rate limits to ensure fair usage and protect the network.

Default Limits

Endpoint CategoryRate LimitWindow
Chat Completions60 requestsper minute
Models List30 requestsper minute
API Key Management20 requestsper minute
Authentication5 requestsper 15 minutes
Role Upgrade3 requestsper hour
Password Reset3 requestsper hour

Rate Limit Headers

Every API response includes rate limit headers:

HeaderDescription
X-RateLimit-LimitMaximum requests in the current window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets

When Rate Limited

If you exceed the rate limit, you’ll receive a 429 Too Many Requests response:

{ "error": { "code": "RATE_LIMITED", "message": "Rate limit exceeded. Try again in 30 seconds." } }

The Retry-After header indicates how many seconds to wait.

Best Practices

  • Implement exponential backoff for retries
  • Cache responses when possible
  • Use streaming for long completions to avoid timeouts
  • Monitor your usage in the Dashboard