LLM Cost Tracking, Attribution, and Budget Alerts

Automatically track LLM spend per request via x-agentcc-cost header. Attribute costs by team, feature, or user and configure budget threshold alerts.

About

Agent Command Center calculates the cost of every request automatically based on token usage and model pricing. The cost appears in the x-agentcc-cost response header and in the response.agentcc.cost SDK accessor. No setup required.

Cost is calculated as:

cost = (input_tokens * input_price_per_token) + (output_tokens * output_price_per_token)

Exact cache hits return x-agentcc-cost: 0 since no provider call was made.

Reading cost per request

from agentcc import AgentCC

client = AgentCC(
    api_key="sk-agentcc-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Cost: ${response.agentcc.cost}")
print(f"Provider: {response.agentcc.provider}")
print(f"Model: {response.agentcc.model_used}")

The Agent Command Center SDK also tracks cumulative cost across all requests made with a client:

# After several requests...
print(f"Total session cost: ${client.current_cost:.4f}")

# Reset the counter
client.reset_cost()

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-agentcc-your-key",
)

raw = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Cost: ${raw.headers.get('x-agentcc-cost')}")
print(f"Provider: {raw.headers.get('x-agentcc-provider')}")

curl -i https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-agentcc-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
# Look for: x-agentcc-cost: 0.00015

Cost attribution

Tag requests with metadata to break down costs by team, feature, user, or any custom dimension. Metadata is indexed and queryable in the analytics dashboard.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    request_metadata={"team": "data-science", "feature": "recommendations", "user": "alice"},
)

import json

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "x-agentcc-metadata": json.dumps({"team": "data-science", "feature": "recommendations", "user": "alice"}),
    },
)

curl https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-agentcc-your-key" \
  -H "Content-Type: application/json" \
  -H 'x-agentcc-metadata: {"team":"data-science","feature":"recommendations","user":"alice"}' \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Analytics dashboard

The Future AGI dashboard shows cost breakdowns and trends across your organization.

Available views:

Total spend for the current period
Cost by model
Cost by provider
Cost by API key
Cost timeseries (daily/weekly/monthly)
Cost by metadata dimension (team, feature, user)

SDK analytics

from agentcc import AgentCC

client = AgentCC(
    api_key="sk-agentcc-your-key",
    base_url="https://gateway.futureagi.com",
    control_plane_url="https://api.futureagi.com",
)

# Spending overview
overview = client.analytics.overview(
    start_date="2026-01-01",
    end_date="2026-01-31",
)

# Cost breakdown by model
costs = client.analytics.cost_breakdown(group_by="model")

# Compare models
comparison = client.analytics.model_comparison(
    models=["gpt-4o", "claude-sonnet-4-6"],
)

Budget alerts

Get notified when spending crosses a threshold. Alerts are configured per organization.

Go to Agent Command Center > Settings > Alerts in the Future AGI dashboard. Create a new alert by selecting the event type, setting recipients, and configuring severity.

alert = client.alerts.create(
    name="Budget warning at 80%",
    condition="cost > 80",
    recipients=["team@example.com"],
    severity="high",
)

const alert = await client.alerts.create({
    name: "Budget warning at 80%",
    condition: "cost > 80",
    recipients: ["team@example.com"],
    severity: "high",
});

Alert types

Event	Trigger
`budget_exceeded`	Spend crosses the budget limit
`budget_threshold`	Spend crosses a percentage threshold (e.g. 80%)
`error_spike`	Error rate exceeds configured threshold
`latency_spike`	P95 latency exceeds configured threshold
`guardrail_triggered`	A guardrail blocks or flags a request

Tip

Configure a cooldown period to prevent alert flooding when thresholds are repeatedly crossed.

Budget enforcement

Budgets are configured on the Rate limiting & budgets page. When a budget is exceeded with action: block, new requests return a 429 error until the next period. See that page for configuration details.

Questions & Discussion

LLM Cost Tracking, Attribution, and Budget Alerts

About

Reading cost per request

Cost attribution

Analytics dashboard

SDK analytics

Budget alerts

Alert types

Budget enforcement

Next Steps

Rate limiting & budgets

Request & response headers

Routing

Caching

Custom Properties