Platform Integration
How Prism AI Gateway connects to the broader Future AGI platform — observability, evaluation, protection, and experimentation.
About
Prism is not a standalone gateway. It’s the data collection and enforcement layer of the Future AGI platform. Every request through Prism generates signals that flow into Observe, Evaluate, Protect, and Experiment — closing the loop between production traffic and model quality.
How the platform fits together
Your application
│
▼
┌─────────┐ traces, costs, latency ┌─────────┐
│ Prism │ ─────────────────────────── │ Observe │
│ Gateway │ └─────────┘
│ │ guardrail scores ┌──────────┐
│ │ ─────────────────────────── │ Evaluate │
│ │ └──────────┘
│ │ shadow results ┌───────────┐
│ │ ─────────────────────────── │ Experiment│
└─────────┘ └───────────┘
Prism → Observe
Every request through Prism generates an execution trace — request, response, latency, token counts, cost, provider used, routing decision, and guardrail outcomes. These traces feed directly into the Observe product.
From Observe you can:
- View per-request traces with full metadata
- Monitor latency percentiles (p50, p95, p99) per model and provider
- Track cost breakdown by model, provider, team, or custom metadata dimension
- See provider health trends and error rate history
- Drill into sessions (
x-prism-session-id) to trace conversation-level patterns
How to tag requests for attribution:
from prism import Prism
client = Prism(
api_key="sk-prism-...",
base_url="https://gateway.futureagi.com",
metadata={"team": "search", "feature": "query-expansion", "env": "production"},
)
These metadata fields appear as filterable dimensions in Observe dashboards.
Prism → Evaluate
Prism’s guardrails are backed by the Future AGI evaluation engine. When you configure a Future AGI Evaluation guardrail, Prism sends each request/response pair to the evaluation engine in real time. The engine runs model-level checks — not just regex — to detect hallucinations, quality regressions, and policy violations.
This is the key differentiator from guardrail products that rely on pattern matching: evaluation guardrails score outputs using the same models and metrics you use in offline eval.
The futureagi guardrail type connects Prism to Evaluate:
config = client.guardrails.configs.create(
name="Production quality gate",
rules=[
{
"name": "futureagi", # Future AGI evaluation engine
"stage": "post",
"mode": "sync",
"action": "warn",
"threshold": 0.7,
}
],
)
Guardrail scores and decisions are logged in both Prism (for traffic analysis) and Evaluate (for quality trend tracking).
Prism → Experiment
Shadow experiments in Prism generate comparison data that feeds directly into Experiment pipelines.
When you configure traffic mirroring, Prism collects:
- Production model responses
- Shadow model responses
- Latency and token deltas for each request pair
These paired results appear in the Experiment product where you can:
- Run automated scoring on response pairs using evaluation metrics
- Calculate win rates across hundreds or thousands of production requests
- Make evidence-based migration decisions before switching providers
Enabling shadow experiments:
from prism import Prism, GatewayConfig, TrafficMirrorConfig
client = Prism(
api_key="sk-prism-...",
base_url="https://gateway.futureagi.com",
config=GatewayConfig(
mirror=TrafficMirrorConfig(
target_model="claude-sonnet-4-20250514",
target_provider="anthropic",
sample_rate=0.1,
)
),
)
Shadow results are automatically synced to the Experiment product for analysis.
Metadata as the connective tissue
The x-prism-metadata header (or metadata= parameter in the SDK) is how you connect Prism data to your application’s dimensions. Tags set on requests flow through to all connected products:
| Tag | Use in Observe | Use in Evaluate | Use in Experiment |
|---|---|---|---|
metadata.team | Cost breakdown by team | Quality trends per team | Experiment scoping by team |
metadata.feature | Latency per feature | Regression alerts per feature | A/B test segmentation |
metadata.user_id | Per-user cost | User-level quality flags | User cohort experiments |
metadata.env | Separate prod/staging metrics | Different quality thresholds | Shadow test isolation |