Rate limiting & budgets

Control request throughput and spending with per-key rate limits, org budgets, and managed key credits.

About

Rate limiting controls how many requests a key or org can make per minute. Budgets control how much money can be spent per period. Credits give individual keys a prepaid USD balance. All three work together to prevent runaway costs and protect provider quotas.


When to use

  • Prevent abuse: Cap RPM per key so one user can’t monopolize gateway capacity
  • Control spending: Set monthly budgets per org so teams can’t exceed their allocation
  • Reseller billing: Give each customer key a credit balance that auto-deducts per request
  • Protect provider quotas: Global RPM limits prevent hitting provider rate limits

Rate limiting

Agent Command Center supports rate limits at three levels: global, per-org, and per-key.

LevelScopeHow to set
GlobalAll requests to the gatewayconfig.yaml
Per-orgAll requests from one organizationOrg config via admin API
Per-keyRequests using a specific API keyKey config (RPM and TPM)

The most restrictive limit applies. If the global limit is 1000 RPM and a key’s limit is 100 RPM, that key is capped at 100 RPM.

Configuration

Go to Agent Command Center > Rate Limits in the Future AGI dashboard to set global and per-org limits.

Per-key limits are set when creating or editing a key in Settings > API Keys.

from agentcc import AgentCC

client = AgentCC(
    api_key="sk-agentcc-your-key",
    base_url="https://gateway.futureagi.com",
    control_plane_url="https://api.futureagi.com",
)

# Set per-org rate limits
client.org_configs.create(
    org_id="your-org-id",
    config={
        "rate_limiting": {
            "enabled": True,
            "rpm": 500,     # requests per minute for this org
            "tpm": 100000,  # tokens per minute for this org
        }
    }
)
import { AgentCC } from "@futureagi/agentcc";

const client = new AgentCC({
    apiKey: "sk-agentcc-your-key",
    baseUrl: "https://gateway.futureagi.com",
    controlPlaneUrl: "https://api.futureagi.com",
});

await client.orgConfigs.create({
    orgId: "your-org-id",
    config: {
        rate_limiting: {
            enabled: true,
            rpm: 500,
            tpm: 100000,
        },
    },
});

Self-hosted config.yaml:

# Global rate limit (all requests)
rate_limiting:
  enabled: true
  global_rpm: 1000

# Per-key limits are set on the key itself
auth:
  keys:
    - name: "limited-key"
      key: "sk-agentcc-..."
      rate_limit_rpm: 100
      rate_limit_tpm: 50000

Response headers

Every response includes rate limit headers:

HeaderDescription
X-Ratelimit-Limit-RequestsMaximum requests allowed per minute
X-Ratelimit-Remaining-RequestsRequests remaining in the current window
X-Ratelimit-Reset-RequestsUnix timestamp when the window resets

Error response (429)

When a rate limit is exceeded:

{
  "error": {
    "type": "rate_limit_exceeded",
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after the window resets."
  }
}

Retry logic

import time
from agentcc import AgentCC, RateLimitError

client = AgentCC(
    api_key="sk-agentcc-your-key",
    base_url="https://gateway.futureagi.com",
)

def call_with_retry(max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Hello"}],
            )
        except RateLimitError:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # 1s, 2s, 4s
                continue
            raise

result = call_with_retry()
import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-agentcc-your-key",
)

def call_with_retry(max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Hello"}],
            )
        except RateLimitError:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)
                continue
            raise

result = call_with_retry()
# Check rate limit headers with -i flag
curl -i -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-agentcc-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
# Look for X-Ratelimit-Remaining-Requests in the response headers

Budgets

Set spending limits per org, per key, per user, or per model. Budgets can be daily, weekly, monthly, or total.

SettingDescription
perioddaily, weekly, monthly, or total
limitUSD amount
actionblock (hard limit, reject requests) or warn (soft limit, log warning)

Go to Agent Command Center > Budgets in the Future AGI dashboard to set org-level budgets and alerts.

client.org_configs.create(
    org_id="your-org-id",
    config={
        "budgets": {
            "enabled": True,
            "org_budget": {
                "period": "monthly",
                "limit": 500.00,
                "action": "block",
            }
        }
    }
)
await client.orgConfigs.create({
    orgId: "your-org-id",
    config: {
        budgets: {
            enabled: true,
            org_budget: {
                period: "monthly",
                limit: 500.00,
                action: "block",
            },
        },
    },
});

Self-hosted config.yaml:

budgets:
  enabled: true
  org_budget:
    period: monthly
    limit: 500.00
    action: block

When a budget is exceeded with action: block, new requests return:

{
  "error": {
    "type": "budget_exceeded",
    "code": "rate_limit_exceeded",
    "message": "Organization monthly budget of $500.00 exceeded"
  }
}

Managed key credits

Managed keys have a USD credit balance that auto-deducts the cost of each request. When credits run out, requests are blocked.

Create a managed key with credits:

curl -X POST https://gateway.futureagi.com/-/keys \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "customer-key",
    "key_type": "managed",
    "credit_balance": 25.00
  }'

Add more credits:

curl -X POST "https://gateway.futureagi.com/-/keys/key_123/credits" \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{"amount": 50.00}'

The remaining balance is returned in the x-agentcc-credits-remaining response header on every request made with a managed key.


Next Steps

Was this page helpful?

Questions & Discussion