Command Center Platform Integration
How Agent Command Center feeds signals into Future AGI Observe, Evaluate, Protect, and Experiment to close the loop between production traffic and model quality.
About
Agent Command Center is not a standalone gateway. It’s the data collection and enforcement layer of the Future AGI platform. Every request through Agent Command Center generates signals that flow into Observe, Evaluate, Protect, and Experiment — closing the loop between production traffic and model quality.
How the platform fits together
Your application
│
▼
┌─────────┐ traces, costs, latency ┌─────────┐
│ Agent Command Center │ ─────────────────────────── │ Observe │
│ Gateway │ └─────────┘
│ │ guardrail scores ┌──────────┐
│ │ ─────────────────────────── │ Evaluate │
│ │ └──────────┘
│ │ shadow results ┌───────────┐
│ │ ─────────────────────────── │ Experiment│
└─────────┘ └───────────┘
Agent Command Center → Observe
Every request through Agent Command Center generates an execution trace — request, response, latency, token counts, cost, provider used, routing decision, and guardrail outcomes. These traces feed directly into the Observe product.
From Observe you can:
- View per-request traces with full metadata
- Monitor latency percentiles (p50, p95, p99) per model and provider
- Track cost breakdown by model, provider, team, or custom metadata dimension
- See provider health trends and error rate history
- Drill into sessions (
x-agentcc-session-id) to trace conversation-level patterns
How to tag requests for attribution:
from agentcc import AgentCC
client = AgentCC(
api_key="sk-agentcc-...",
base_url="https://gateway.futureagi.com",
metadata={"team": "search", "feature": "query-expansion", "env": "production"},
)
These metadata fields appear as filterable dimensions in Observe dashboards.
Agent Command Center → Evaluate
Agent Command Center’s guardrails are backed by the Future AGI evaluation engine. When you configure a Future AGI Evaluation guardrail, Agent Command Center sends each request/response pair to the evaluation engine in real time. The engine runs model-level checks — not just regex — to detect hallucinations, quality regressions, and policy violations.
This is the key differentiator from guardrail products that rely on pattern matching: evaluation guardrails score outputs using the same models and metrics you use in offline eval.
The futureagi guardrail type connects Agent Command Center to Evaluate:
config = client.guardrails.configs.create(
name="Production quality gate",
rules=[
{
"name": "futureagi", # Future AGI evaluation engine
"stage": "post",
"mode": "sync",
"action": "warn",
"threshold": 0.7,
}
],
)
Guardrail scores and decisions are logged in both Agent Command Center (for traffic analysis) and Evaluate (for quality trend tracking).
Agent Command Center → Experiment
Shadow experiments in Agent Command Center generate comparison data that feeds directly into Experiment pipelines.
When you configure traffic mirroring, Agent Command Center collects:
- Production model responses
- Shadow model responses
- Latency and token deltas for each request pair
These paired results appear in the Experiment product where you can:
- Run automated scoring on response pairs using evaluation metrics
- Calculate win rates across hundreds or thousands of production requests
- Make evidence-based migration decisions before switching providers
Enabling shadow experiments:
from agentcc import AgentCC, GatewayConfig, TrafficMirrorConfig
client = AgentCC(
api_key="sk-agentcc-...",
base_url="https://gateway.futureagi.com",
config=GatewayConfig(
mirror=TrafficMirrorConfig(
target_model="claude-sonnet-4-20250514",
target_provider="anthropic",
sample_rate=0.1,
)
),
)
Shadow results are automatically synced to the Experiment product for analysis.
Metadata as the connective tissue
The x-agentcc-metadata header (or metadata= parameter in the SDK) is how you connect Agent Command Center data to your application’s dimensions. Tags set on requests flow through to all connected products:
| Tag | Use in Observe | Use in Evaluate | Use in Experiment |
|---|---|---|---|
metadata.team | Cost breakdown by team | Quality trends per team | Experiment scoping by team |
metadata.feature | Latency per feature | Regression alerts per feature | A/B test segmentation |
metadata.user_id | Per-user cost | User-level quality flags | User cohort experiments |
metadata.env | Separate prod/staging metrics | Different quality thresholds | Shadow test isolation |
Next Steps
Questions & Discussion