Shadow Experiments

Mirror a percentage of production LLM traffic to alternative models for zero-risk evaluation.

About

Shadow experiments let you silently copy a percentage of production LLM requests to a second model without affecting the user-facing response. Your primary model handles the request normally and returns the response to the user. Simultaneously, a background process sends a copy of the same request to a shadow model for evaluation.

This approach gives you real production data for model comparison, cost analysis, and provider migration testing, all without any impact on user experience. Results are collected and synced to the Future AGI dashboard for analysis.

When to use

Model evaluation: Test a new model on real production traffic before switching
Cost comparison: Compare pricing and token usage between models without affecting users
Provider migration: Validate a provider switch (e.g., OpenAI to Anthropic) on a fraction of traffic
Prompt validation: Test prompt changes in production before full rollout
Latency analysis: Compare response times between models under real load

How it works

When you enable shadow experiments:

A request arrives at the gateway for your primary model
The primary model processes the request and returns the response to the user immediately
Simultaneously, a background goroutine sends a copy of the request to the shadow model
The shadow model’s response, latency, token count, and status code are captured
Results are collected and periodically synced to the Future AGI dashboard

The user never waits for the shadow model. If the shadow call fails or times out, it doesn’t affect the primary response.

Configuring via SDK (per-request)

You can enable shadow experiments on a per-request basis by passing a GatewayConfig with TrafficMirrorConfig to the Prism client.

from prism import Prism, GatewayConfig, TrafficMirrorConfig

client = Prism(
    api_key="sk-prism-...",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        mirror=TrafficMirrorConfig(
            target_model="claude-sonnet-4-20250514",
            target_provider="anthropic",
            sample_rate=0.1,  # Mirror 10% of traffic
        )
    ),
)

# Normal request — 10% of traffic is silently mirrored to Claude
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize the latest AI news."}],
)
print(response.choices[0].message.content)

import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-...",
  baseUrl: "https://gateway.futureagi.com",
  config: {
    mirror: {
      target_model: "claude-sonnet-4-20250514",
      target_provider: "anthropic",
      sample_rate: 0.1,  // Mirror 10% of traffic
    },
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize the latest AI news." }],
});
console.log(response.choices[0].message.content);

Configuration options

target_model: The model to mirror traffic to (e.g., "claude-sonnet-4-20250514")
target_provider: The provider of the shadow model (e.g., "anthropic", "openai")
sample_rate: Float between 0.0 and 1.0. 0.1 mirrors 10% of traffic, 1.0 mirrors 100%
enabled: Set to false to disable mirroring (defaults to true)

Configuring in config.yaml (gateway-level)

For persistent gateway-level configuration, add a routing.mirror section to your config.yaml:

routing:
  mirror:
    enabled: true
    rules:
      - source_model: "gpt-4o"
        target_provider: "anthropic"
        target_model: "claude-sonnet-4-20250514"
        sample_rate: 0.1    # Mirror 10% of gpt-4o traffic

      - source_model: "gpt-4-turbo"
        target_provider: "anthropic"
        target_model: "claude-opus-4-20250514"
        sample_rate: 0.05   # Mirror 5% of gpt-4-turbo traffic

      - source_model: "*"   # Wildcard: mirror ALL models
        target_provider: "staging"
        sample_rate: 0.01   # 1% of all traffic

Use "*" as the source_model to mirror all requests regardless of the primary model. Rules are evaluated in order, so place more specific rules before wildcard rules.

Collected data

Each mirrored request produces a shadow result with the following fields:

{
  "request_id": "req_abc123",
  "experiment_id": "exp_xyz",
  "source_model": "gpt-4o",
  "shadow_model": "claude-sonnet-4-20250514",
  "source_response": "The capital of France is Paris.",
  "shadow_response": "Paris is the capital of France.",
  "source_latency_ms": 450,
  "shadow_latency_ms": 380,
  "source_tokens": 312,
  "shadow_tokens": 295,
  "source_status_code": 200,
  "shadow_status_code": 200,
  "shadow_error": "",
  "prompt_hash": "a1b2c3d4",
  "created_at": "2026-03-25T10:30:00Z"
}

Field descriptions

Field	Description
`request_id`	Unique identifier for the original request
`experiment_id`	Identifier for this shadow experiment run
`source_model`	The primary model that handled the user request
`shadow_model`	The shadow model that processed the copy
`source_response`	The response text from the primary model
`shadow_response`	The response text from the shadow model
`source_latency_ms`	Time in milliseconds for the primary model to respond
`shadow_latency_ms`	Time in milliseconds for the shadow model to respond
`source_tokens`	Total tokens used by the primary model
`shadow_tokens`	Total tokens used by the shadow model
`source_status_code`	HTTP status code from the primary model
`shadow_status_code`	HTTP status code from the shadow model
`shadow_error`	Error message if the shadow call failed (empty if successful)
`prompt_hash`	Hash of the prompt for deduplication and analysis
`created_at`	Timestamp when the shadow result was created

Note

Shadow results appear in the Future AGI dashboard after periodic sync. Direct API access to results is not currently available.

Important notes

Non-streaming mirrors: Shadow copies are always sent as non-streaming requests, even if the original request was streaming
Billing: You are billed for shadow calls to the target provider at standard rates
Sample rate format: sample_rate is a float from 0.0 to 1.0 (not a percentage). Use 0.1 for 10%, 0.5 for 50%, 1.0 for 100%
Timeout: Shadow calls have a 30-second timeout. If the shadow model doesn’t respond within this window, the call is abandoned and an error is recorded
No user impact: Shadow failures never affect the primary response or user experience

Next steps

Routing & Reliability

Learn how to configure request routing and failover strategies

Cost Tracking

Monitor and analyze costs across multiple models and providers

Was this page helpful?

FutureAGI AI Assistant