Cost tracking

Track LLM costs per request, attribute spend by team and feature, and configure budget alerts.

About

Prism calculates the cost of every request automatically based on token usage and model pricing. The cost appears in the x-prism-cost response header and in the response.prism.cost SDK accessor. No setup required.

Cost is calculated as:

cost = (input_tokens * input_price_per_token) + (output_tokens * output_price_per_token)

Exact cache hits return x-prism-cost: 0 since no provider call was made.


Reading cost per request

from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Cost: ${response.prism.cost}")
print(f"Provider: {response.prism.provider}")
print(f"Model: {response.prism.model_used}")

The Prism SDK also tracks cumulative cost across all requests made with a client:

# After several requests...
print(f"Total session cost: ${client.current_cost:.4f}")

# Reset the counter
client.reset_cost()
from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-key",
)

raw = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Cost: ${raw.headers.get('x-prism-cost')}")
print(f"Provider: {raw.headers.get('x-prism-provider')}")
curl -i https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
# Look for: x-prism-cost: 0.00015

Cost attribution

Tag requests with metadata to break down costs by team, feature, user, or any custom dimension. Metadata is indexed and queryable in the analytics dashboard.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    request_metadata={"team": "data-science", "feature": "recommendations", "user": "alice"},
)
import json

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "x-prism-metadata": json.dumps({"team": "data-science", "feature": "recommendations", "user": "alice"}),
    },
)
curl https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -H 'x-prism-metadata: {"team":"data-science","feature":"recommendations","user":"alice"}' \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Analytics dashboard

The Future AGI dashboard shows cost breakdowns and trends across your organization.

Available views:

  • Total spend for the current period
  • Cost by model
  • Cost by provider
  • Cost by API key
  • Cost timeseries (daily/weekly/monthly)
  • Cost by metadata dimension (team, feature, user)

SDK analytics

from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    control_plane_url="https://api.futureagi.com",
)

# Spending overview
overview = client.analytics.overview(
    start_date="2026-01-01",
    end_date="2026-01-31",
)

# Cost breakdown by model
costs = client.analytics.cost_breakdown(group_by="model")

# Compare models
comparison = client.analytics.model_comparison(
    models=["gpt-4o", "claude-sonnet-4-6"],
)

Budget alerts

Get notified when spending crosses a threshold. Alerts are configured per organization.

Go to Prism > Settings > Alerts in the Future AGI dashboard. Create a new alert by selecting the event type, setting recipients, and configuring severity.

alert = client.alerts.create(
    name="Budget warning at 80%",
    condition="cost > 80",
    recipients=["team@example.com"],
    severity="high",
)
const alert = await client.alerts.create({
    name: "Budget warning at 80%",
    condition: "cost > 80",
    recipients: ["team@example.com"],
    severity: "high",
});

Alert types

EventTrigger
budget_exceededSpend crosses the budget limit
budget_thresholdSpend crosses a percentage threshold (e.g. 80%)
error_spikeError rate exceeds configured threshold
latency_spikeP95 latency exceeds configured threshold
guardrail_triggeredA guardrail blocks or flags a request

Tip

Configure a cooldown period to prevent alert flooding when thresholds are repeatedly crossed.


Budget enforcement

Budgets are configured on the Rate limiting & budgets page. When a budget is exceeded with action: block, new requests return a 429 error until the next period. See that page for configuration details.


Next Steps

Was this page helpful?

Questions & Discussion