Cost Tracking & Budgets

Track LLM costs per request, set budget limits, and configure spend alerts.

What it is

Prism tracks the cost of every LLM request automatically, giving full visibility into AI spend. Each request returns its cost in the X-Prism-Cost header. Budget limits prevent runaway costs. Cost attribution by team, feature, or user is available via metadata headers.


Use cases

  • Spend monitoring — Track exactly how much each request, model, and provider costs in real time
  • Budget enforcement — Prevent runaway costs with configurable spending limits per org
  • Cost attribution — Break down spend by team, feature, or user with custom metadata headers
  • Threshold alerts — Receive email notifications when spend crosses defined thresholds

Per-Request Cost Tracking

Every request through Prism includes cost information in the response headers.

curl https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Response includes:

X-Prism-Cost: 0.00015
from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Cost: ${response.prism.cost}")
print(f"Total spend so far: ${client.current_cost}")

Cost is calculated as:

cost = (input_tokens × input_price_per_token) + (output_tokens × output_price_per_token)

Cache hits return X-Prism-Cost: 0.


Cost Analytics

View detailed cost breakdowns and trends across your organization.

Access the analytics dashboard at https://app.futureagi.com/dashboard/gateway/analytics

Available breakdowns:

  • Total spend (current period)
  • Cost by model
  • Cost by provider
  • Cost by API key
  • Cost timeseries
from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    control_plane_url="https://api.futureagi.com",
)

overview = client.analytics.overview(start_date="2025-01-01", end_date="2025-01-31")
costs = client.analytics.cost_breakdown(group_by="model")
latency = client.analytics.latency_stats(percentiles=[50, 95, 99])
model_cmp = client.analytics.model_comparison(models=["gpt-4o", "claude-sonnet-4-6"])

Budgets

Set spending limits to prevent runaway costs. When a budget is exceeded, new requests are blocked until the next period begins.

Navigate to Settings → Budgets at https://app.futureagi.com/dashboard/gateway/settings

Configure:

  • Budget limit (USD)
  • Budget period (daily, weekly, monthly)
  • Alert threshold percentage
config = client.org_configs.create(
    org_id="your-org-id",
    config={
        "budgets": {
            "limit": 100.00,
            "period": "monthly",
            "alert_threshold_percent": 80
        }
    },
)
SettingValuesDescription
budget_limitUSD amountMaximum spend allowed per period
budget_perioddaily, weekly, monthlyReset frequency
alert_threshold_percent0-100Percentage of budget before alert fires

When budget is exceeded, new requests receive a 429 error until the next period. Email alert is sent when threshold is crossed.


Email Alerts

Configure alerts for budget overages, errors, latency spikes, and guardrail triggers.

Navigate to Settings → Alerts at https://app.futureagi.com/dashboard/gateway/email-alerts

Create a new alert:

  1. Name the alert
  2. Select event type
  3. Set recipients
  4. Configure severity
alert = client.alerts.create(
    name="Budget warning",
    condition="cost > 80",
    recipients=["team@example.com"],
    severity="high",
)
Event TypeTrigger
budget_exceededSpend crosses the budget limit
error_spikeError rate exceeds threshold
latency_spikeP95 latency exceeds threshold (P95 means 95% of requests are faster than this value — a spike means the slowest 5% got significantly slower)
guardrail_triggeredA guardrail blocks or flags a request

Tip

Configure a cooldown period to prevent alert flooding when thresholds are repeatedly crossed.


Cost Attribution with Metadata

Tag requests with custom metadata to break down costs by team, feature, user, or any custom dimension.

curl https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "x-prism-metadata: team=data-science,feature=recommendations,user=alice" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "x-prism-metadata": "team=data-science,feature=recommendations,user=alice"
    },
)

Metadata is indexed and queryable in the analytics dashboard for cost attribution.


What you can do next

Was this page helpful?

Questions & Discussion