Platform Integration

How Prism AI Gateway connects to the broader Future AGI platform — observability, evaluation, protection, and experimentation.

About

Prism is not a standalone gateway. It’s the data collection and enforcement layer of the Future AGI platform. Every request through Prism generates signals that flow into Observe, Evaluate, Protect, and Experiment — closing the loop between production traffic and model quality.


How the platform fits together

Your application


  ┌─────────┐    traces, costs, latency    ┌─────────┐
  │  Prism  │ ─────────────────────────── │ Observe │
  │ Gateway │                              └─────────┘
  │         │    guardrail scores          ┌──────────┐
  │         │ ─────────────────────────── │ Evaluate │
  │         │                              └──────────┘
  │         │    shadow results            ┌───────────┐
  │         │ ─────────────────────────── │ Experiment│
  └─────────┘                              └───────────┘

Prism → Observe

Every request through Prism generates an execution trace — request, response, latency, token counts, cost, provider used, routing decision, and guardrail outcomes. These traces feed directly into the Observe product.

From Observe you can:

  • View per-request traces with full metadata
  • Monitor latency percentiles (p50, p95, p99) per model and provider
  • Track cost breakdown by model, provider, team, or custom metadata dimension
  • See provider health trends and error rate history
  • Drill into sessions (x-prism-session-id) to trace conversation-level patterns

How to tag requests for attribution:

from prism import Prism

client = Prism(
    api_key="sk-prism-...",
    base_url="https://gateway.futureagi.com",
    metadata={"team": "search", "feature": "query-expansion", "env": "production"},
)

These metadata fields appear as filterable dimensions in Observe dashboards.


Prism → Evaluate

Prism’s guardrails are backed by the Future AGI evaluation engine. When you configure a Future AGI Evaluation guardrail, Prism sends each request/response pair to the evaluation engine in real time. The engine runs model-level checks — not just regex — to detect hallucinations, quality regressions, and policy violations.

This is the key differentiator from guardrail products that rely on pattern matching: evaluation guardrails score outputs using the same models and metrics you use in offline eval.

The futureagi guardrail type connects Prism to Evaluate:

config = client.guardrails.configs.create(
    name="Production quality gate",
    rules=[
        {
            "name": "futureagi",          # Future AGI evaluation engine
            "stage": "post",
            "mode": "sync",
            "action": "warn",
            "threshold": 0.7,
        }
    ],
)

Guardrail scores and decisions are logged in both Prism (for traffic analysis) and Evaluate (for quality trend tracking).


Prism → Experiment

Shadow experiments in Prism generate comparison data that feeds directly into Experiment pipelines.

When you configure traffic mirroring, Prism collects:

  • Production model responses
  • Shadow model responses
  • Latency and token deltas for each request pair

These paired results appear in the Experiment product where you can:

  • Run automated scoring on response pairs using evaluation metrics
  • Calculate win rates across hundreds or thousands of production requests
  • Make evidence-based migration decisions before switching providers

Enabling shadow experiments:

from prism import Prism, GatewayConfig, TrafficMirrorConfig

client = Prism(
    api_key="sk-prism-...",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        mirror=TrafficMirrorConfig(
            target_model="claude-sonnet-4-20250514",
            target_provider="anthropic",
            sample_rate=0.1,
        )
    ),
)

Shadow results are automatically synced to the Experiment product for analysis.


Metadata as the connective tissue

The x-prism-metadata header (or metadata= parameter in the SDK) is how you connect Prism data to your application’s dimensions. Tags set on requests flow through to all connected products:

TagUse in ObserveUse in EvaluateUse in Experiment
metadata.teamCost breakdown by teamQuality trends per teamExperiment scoping by team
metadata.featureLatency per featureRegression alerts per featureA/B test segmentation
metadata.user_idPer-user costUser-level quality flagsUser cohort experiments
metadata.envSeparate prod/staging metricsDifferent quality thresholdsShadow test isolation

Next steps

Was this page helpful?

Questions & Discussion