Rate limiting & budgets
Control request throughput and spending with per-key rate limits, org budgets, and managed key credits.
About
Rate limiting controls how many requests a key or org can make per minute. Budgets control how much money can be spent per period. Credits give individual keys a prepaid USD balance. All three work together to prevent runaway costs and protect provider quotas.
When to use
- Prevent abuse: Cap RPM per key so one user can’t monopolize gateway capacity
- Control spending: Set monthly budgets per org so teams can’t exceed their allocation
- Reseller billing: Give each customer key a credit balance that auto-deducts per request
- Protect provider quotas: Global RPM limits prevent hitting provider rate limits
Rate limiting
Agent Command Center supports rate limits at three levels: global, per-org, and per-key.
| Level | Scope | How to set |
|---|---|---|
| Global | All requests to the gateway | config.yaml |
| Per-org | All requests from one organization | Org config via admin API |
| Per-key | Requests using a specific API key | Key config (RPM and TPM) |
The most restrictive limit applies. If the global limit is 1000 RPM and a key’s limit is 100 RPM, that key is capped at 100 RPM.
Configuration
Go to Agent Command Center > Rate Limits in the Future AGI dashboard to set global and per-org limits.
Per-key limits are set when creating or editing a key in Settings > API Keys.
from agentcc import AgentCC
client = AgentCC(
api_key="sk-agentcc-your-key",
base_url="https://gateway.futureagi.com",
control_plane_url="https://api.futureagi.com",
)
# Set per-org rate limits
client.org_configs.create(
org_id="your-org-id",
config={
"rate_limiting": {
"enabled": True,
"rpm": 500, # requests per minute for this org
"tpm": 100000, # tokens per minute for this org
}
}
) import { AgentCC } from "@futureagi/agentcc";
const client = new AgentCC({
apiKey: "sk-agentcc-your-key",
baseUrl: "https://gateway.futureagi.com",
controlPlaneUrl: "https://api.futureagi.com",
});
await client.orgConfigs.create({
orgId: "your-org-id",
config: {
rate_limiting: {
enabled: true,
rpm: 500,
tpm: 100000,
},
},
}); Self-hosted config.yaml:
# Global rate limit (all requests)
rate_limiting:
enabled: true
global_rpm: 1000
# Per-key limits are set on the key itself
auth:
keys:
- name: "limited-key"
key: "sk-agentcc-..."
rate_limit_rpm: 100
rate_limit_tpm: 50000
Response headers
Every response includes rate limit headers:
| Header | Description |
|---|---|
X-Ratelimit-Limit-Requests | Maximum requests allowed per minute |
X-Ratelimit-Remaining-Requests | Requests remaining in the current window |
X-Ratelimit-Reset-Requests | Unix timestamp when the window resets |
Error response (429)
When a rate limit is exceeded:
{
"error": {
"type": "rate_limit_exceeded",
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please retry after the window resets."
}
}
Retry logic
import time
from agentcc import AgentCC, RateLimitError
client = AgentCC(
api_key="sk-agentcc-your-key",
base_url="https://gateway.futureagi.com",
)
def call_with_retry(max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # 1s, 2s, 4s
continue
raise
result = call_with_retry() import time
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://gateway.futureagi.com/v1",
api_key="sk-agentcc-your-key",
)
def call_with_retry(max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
raise
result = call_with_retry() # Check rate limit headers with -i flag
curl -i -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-agentcc-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Look for X-Ratelimit-Remaining-Requests in the response headers Budgets
Set spending limits per org, per key, per user, or per model. Budgets can be daily, weekly, monthly, or total.
| Setting | Description |
|---|---|
period | daily, weekly, monthly, or total |
limit | USD amount |
action | block (hard limit, reject requests) or warn (soft limit, log warning) |
Go to Agent Command Center > Budgets in the Future AGI dashboard to set org-level budgets and alerts.
client.org_configs.create(
org_id="your-org-id",
config={
"budgets": {
"enabled": True,
"org_budget": {
"period": "monthly",
"limit": 500.00,
"action": "block",
}
}
}
) await client.orgConfigs.create({
orgId: "your-org-id",
config: {
budgets: {
enabled: true,
org_budget: {
period: "monthly",
limit: 500.00,
action: "block",
},
},
},
}); Self-hosted config.yaml:
budgets:
enabled: true
org_budget:
period: monthly
limit: 500.00
action: block
When a budget is exceeded with action: block, new requests return:
{
"error": {
"type": "budget_exceeded",
"code": "rate_limit_exceeded",
"message": "Organization monthly budget of $500.00 exceeded"
}
}
Managed key credits
Managed keys have a USD credit balance that auto-deducts the cost of each request. When credits run out, requests are blocked.
Create a managed key with credits:
curl -X POST https://gateway.futureagi.com/-/keys \
-H "Authorization: Bearer your-admin-token" \
-H "Content-Type: application/json" \
-d '{
"name": "customer-key",
"key_type": "managed",
"credit_balance": 25.00
}'
Add more credits:
curl -X POST "https://gateway.futureagi.com/-/keys/key_123/credits" \
-H "Authorization: Bearer your-admin-token" \
-H "Content-Type: application/json" \
-d '{"amount": 50.00}'
The remaining balance is returned in the x-agentcc-credits-remaining response header on every request made with a managed key.