Cost Tracking & Budgets
Track LLM costs per request, set budget limits, and configure spend alerts.
What it is
Prism tracks the cost of every LLM request automatically, giving full visibility into AI spend. Each request returns its cost in the X-Prism-Cost header. Budget limits prevent runaway costs. Cost attribution by team, feature, or user is available via metadata headers.
Use cases
- Spend monitoring — Track exactly how much each request, model, and provider costs in real time
- Budget enforcement — Prevent runaway costs with configurable spending limits per org
- Cost attribution — Break down spend by team, feature, or user with custom metadata headers
- Threshold alerts — Receive email notifications when spend crosses defined thresholds
Per-Request Cost Tracking
Every request through Prism includes cost information in the response headers.
curl https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}'Response includes:
X-Prism-Cost: 0.00015 from prism import Prism
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
print(f"Cost: ${response.prism.cost}")
print(f"Total spend so far: ${client.current_cost}") Cost is calculated as:
cost = (input_tokens × input_price_per_token) + (output_tokens × output_price_per_token)
Cache hits return X-Prism-Cost: 0.
Cost Analytics
View detailed cost breakdowns and trends across your organization.
Access the analytics dashboard at https://app.futureagi.com/dashboard/gateway/analytics
Available breakdowns:
- Total spend (current period)
- Cost by model
- Cost by provider
- Cost by API key
- Cost timeseries
from prism import Prism
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
control_plane_url="https://api.futureagi.com",
)
overview = client.analytics.overview(start_date="2025-01-01", end_date="2025-01-31")
costs = client.analytics.cost_breakdown(group_by="model")
latency = client.analytics.latency_stats(percentiles=[50, 95, 99])
model_cmp = client.analytics.model_comparison(models=["gpt-4o", "claude-sonnet-4-6"]) Budgets
Set spending limits to prevent runaway costs. When a budget is exceeded, new requests are blocked until the next period begins.
Navigate to Settings → Budgets at https://app.futureagi.com/dashboard/gateway/settings
Configure:
- Budget limit (USD)
- Budget period (daily, weekly, monthly)
- Alert threshold percentage
config = client.org_configs.create(
org_id="your-org-id",
config={
"budgets": {
"limit": 100.00,
"period": "monthly",
"alert_threshold_percent": 80
}
},
) | Setting | Values | Description |
|---|---|---|
budget_limit | USD amount | Maximum spend allowed per period |
budget_period | daily, weekly, monthly | Reset frequency |
alert_threshold_percent | 0-100 | Percentage of budget before alert fires |
When budget is exceeded, new requests receive a 429 error until the next period. Email alert is sent when threshold is crossed.
Email Alerts
Configure alerts for budget overages, errors, latency spikes, and guardrail triggers.
Navigate to Settings → Alerts at https://app.futureagi.com/dashboard/gateway/email-alerts
Create a new alert:
- Name the alert
- Select event type
- Set recipients
- Configure severity
alert = client.alerts.create(
name="Budget warning",
condition="cost > 80",
recipients=["team@example.com"],
severity="high",
) | Event Type | Trigger |
|---|---|
budget_exceeded | Spend crosses the budget limit |
error_spike | Error rate exceeds threshold |
latency_spike | P95 latency exceeds threshold (P95 means 95% of requests are faster than this value — a spike means the slowest 5% got significantly slower) |
guardrail_triggered | A guardrail blocks or flags a request |
Tip
Configure a cooldown period to prevent alert flooding when thresholds are repeatedly crossed.
Cost Attribution with Metadata
Tag requests with custom metadata to break down costs by team, feature, user, or any custom dimension.
curl https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "x-prism-metadata: team=data-science,feature=recommendations,user=alice" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}' response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={
"x-prism-metadata": "team=data-science,feature=recommendations,user=alice"
},
) Metadata is indexed and queryable in the analytics dashboard for cost attribution.