Rate Limiting

Understand how rate limits work, detect when you hit them, and implement retry logic that keeps your application running smoothly.

How Rate Limits Work

Conduit.im enforces rate limits to ensure fair usage and protect upstream providers. Limits are applied per API key and are measured in requests per minute. When you exceed a limit, the API returns a 429 Too Many Requests response.

Note: Rate limits are separate from spending limits. A spending limit caps how much money a key can spend; a rate limit caps how many requests it can make in a time window.

Detecting a Rate Limit

When you are rate-limited, the API returns HTTP 429 with a JSON error body and a Retry-After header indicating how many seconds to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 5
Content-Type: application/json

{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "code": "RATE_LIMIT_EXCEEDED",
    "timestamp": "2026-03-09T14:32:00.000Z",
    "requestId": "req_abc123"
  }
}

Rate Limit Error Codes

The code field tells you exactly which limit was hit so you can respond appropriately:

Code	Retryable	Description
RATE_LIMIT_EXCEEDED	Yes	Too many requests per minute — wait and retry
CHAT_RATE_LIMIT_EXCEEDED	Yes	Chat-specific rate limit hit — slow down chat requests
RATE_LIMIT_QUOTA_EXCEEDED	No	Daily quota exhausted — resets at midnight UTC
API_KEY_LIMIT_EXCEEDED	No	Per-key spending limit reached — increase it in the dashboard

Exponential Back-off

The recommended retry strategy is exponential back-off with jitter. Each retry waits longer than the last, and a random jitter prevents all clients from retrying at the same moment:

Attempt	Base delay	With jitter (typical)
1	1 s	0.5 – 1.5 s
2	2 s	1 – 3 s
3	4 s	2 – 6 s

JavaScript / TypeScript

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url, options);

    if (res.ok) return await res.json();

    // Only retry rate limits and server errors
    if (res.status !== 429 && res.status < 500) {
      const { error } = await res.json();
      throw new Error(error.message);
    }

    // Prefer the server's Retry-After header if present
    const retryAfter = res.headers.get("Retry-After");
    const baseDelay = retryAfter
      ? Number(retryAfter) * 1000
      : 1000 * 2 ** attempt;

    // Add random jitter (±50 %)
    const jitter = baseDelay * (0.5 + Math.random());
    await new Promise((r) => setTimeout(r, jitter));
  }

  throw new Error("Max retries exceeded");
}

Python

import time, random, requests

def fetch_with_retry(url, headers, json_body, max_retries=3):
    for attempt in range(max_retries):
        res = requests.post(url, headers=headers, json=json_body)

        if res.ok:
            return res.json()

        # Only retry rate limits and server errors
        if res.status_code != 429 and res.status_code < 500:
            raise Exception(res.json()["error"]["message"])

        # Prefer the server's Retry-After header if present
        retry_after = res.headers.get("Retry-After")
        base_delay = float(retry_after) if retry_after else 2 ** attempt

        # Add random jitter (±50 %)
        jitter = base_delay * (0.5 + random.random())
        time.sleep(jitter)

    raise Exception("Max retries exceeded")

The Retry-After Header

When the API returns a 429, it includes a Retry-After header with the number of seconds to wait. Always respect this value — it is the fastest safe retry time:

const retryAfter = response.headers.get("Retry-After");
if (retryAfter) {
  await new Promise((r) => setTimeout(r, Number(retryAfter) * 1000));
  // Now safe to retry
}

Important: Retrying before the Retry-After window elapses will result in another 429 and may extend the cool-down period.

Best Practices

Use a request queue

Instead of sending requests as fast as possible, enqueue them and process at a controlled rate (e.g., one request per 100 ms). This avoids hitting limits in the first place.

Set per-key spending limits

Configure spending limits on each API key to prevent runaway costs. A spending limit is a hard cap, not a rate limit, but it provides an extra safety net.

Don't retry non-retryable errors

Only retry RATE_LIMIT_EXCEEDED and server errors (5xx). Errors like RATE_LIMIT_QUOTA_EXCEEDED or API_KEY_LIMIT_EXCEEDED require user action, not retries.

Add jitter to back-off

Without jitter, multiple clients that hit a limit at the same time will all retry together, causing a "thundering herd." Random jitter spreads retries out and improves success rates.

Cap the number of retries

Set a maximum (e.g., 3–5 retries). After exhausting retries, surface a clear error to the user rather than blocking indefinitely.

Example: Simple Request Queue

A basic queue that spaces out requests to stay under the rate limit:

class RequestQueue {
  constructor(minIntervalMs = 100) {
    this.queue = [];
    this.minInterval = minIntervalMs;
    this.processing = false;
  }

  enqueue(fn) {
    return new Promise((resolve, reject) => {
      this.queue.push({ fn, resolve, reject });
      if (!this.processing) this.#process();
    });
  }

  async #process() {
    this.processing = true;
    while (this.queue.length > 0) {
      const { fn, resolve, reject } = this.queue.shift();
      try {
        resolve(await fn());
      } catch (err) {
        reject(err);
      }
      await new Promise((r) => setTimeout(r, this.minInterval));
    }
    this.processing = false;
  }
}

// Usage
const queue = new RequestQueue(200); // 5 requests per second max

const result = await queue.enqueue(() =>
  fetchWithRetry(url, options)
);

Next Steps

You now know how to detect, handle, and prevent rate limit errors. Continue learning:

Documentation

Rate Limiting

How Rate Limits Work

Detecting a Rate Limit

Rate Limit Error Codes

Exponential Back-off

JavaScript / TypeScript

Python

The Retry-After Header

Best Practices

Use a request queue

Set per-key spending limits

Don't retry non-retryable errors

Add jitter to back-off

Cap the number of retries

Example: Simple Request Queue

Next Steps

Best Practices

Usage & Billing

Error Handling

Streaming Responses