Shadow Experiments

Mirror a percentage of production LLM traffic to alternative models for zero-risk evaluation.

About

Shadow experiments let you silently copy a percentage of production LLM requests to a second model without affecting the user-facing response. Your primary model handles the request normally and returns the response to the user. Simultaneously, a background process sends a copy of the same request to a shadow model for evaluation.

This approach gives you real production data for model comparison, cost analysis, and provider migration testing, all without any impact on user experience. Results are collected and synced to the Future AGI dashboard for analysis.

When to use

  • Model evaluation: Test a new model on real production traffic before switching
  • Cost comparison: Compare pricing and token usage between models without affecting users
  • Provider migration: Validate a provider switch (e.g., OpenAI to Anthropic) on a fraction of traffic
  • Prompt validation: Test prompt changes in production before full rollout
  • Latency analysis: Compare response times between models under real load

How it works

When you enable shadow experiments:

  1. A request arrives at the gateway for your primary model
  2. The primary model processes the request and returns the response to the user immediately
  3. Simultaneously, a background goroutine sends a copy of the request to the shadow model
  4. The shadow model’s response, latency, token count, and status code are captured
  5. Results are collected and periodically synced to the Future AGI dashboard

The user never waits for the shadow model. If the shadow call fails or times out, it doesn’t affect the primary response.

Configuring via SDK (per-request)

You can enable shadow experiments on a per-request basis by passing a GatewayConfig with TrafficMirrorConfig to the Prism client.

from prism import Prism, GatewayConfig, TrafficMirrorConfig

client = Prism(
    api_key="sk-prism-...",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        mirror=TrafficMirrorConfig(
            target_model="claude-sonnet-4-20250514",
            target_provider="anthropic",
            sample_rate=0.1,  # Mirror 10% of traffic
        )
    ),
)

# Normal request — 10% of traffic is silently mirrored to Claude
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize the latest AI news."}],
)
print(response.choices[0].message.content)
import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-...",
  baseUrl: "https://gateway.futureagi.com",
  config: {
    mirror: {
      target_model: "claude-sonnet-4-20250514",
      target_provider: "anthropic",
      sample_rate: 0.1,  // Mirror 10% of traffic
    },
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize the latest AI news." }],
});
console.log(response.choices[0].message.content);

Configuration options

  • target_model: The model to mirror traffic to (e.g., "claude-sonnet-4-20250514")
  • target_provider: The provider of the shadow model (e.g., "anthropic", "openai")
  • sample_rate: Float between 0.0 and 1.0. 0.1 mirrors 10% of traffic, 1.0 mirrors 100%
  • enabled: Set to false to disable mirroring (defaults to true)

Configuring in config.yaml (gateway-level)

For persistent gateway-level configuration, add a routing.mirror section to your config.yaml:

routing:
  mirror:
    enabled: true
    rules:
      - source_model: "gpt-4o"
        target_provider: "anthropic"
        target_model: "claude-sonnet-4-20250514"
        sample_rate: 0.1    # Mirror 10% of gpt-4o traffic

      - source_model: "gpt-4-turbo"
        target_provider: "anthropic"
        target_model: "claude-opus-4-20250514"
        sample_rate: 0.05   # Mirror 5% of gpt-4-turbo traffic

      - source_model: "*"   # Wildcard: mirror ALL models
        target_provider: "staging"
        sample_rate: 0.01   # 1% of all traffic

Use "*" as the source_model to mirror all requests regardless of the primary model. Rules are evaluated in order, so place more specific rules before wildcard rules.

Collected data

Each mirrored request produces a shadow result with the following fields:

{
  "request_id": "req_abc123",
  "experiment_id": "exp_xyz",
  "source_model": "gpt-4o",
  "shadow_model": "claude-sonnet-4-20250514",
  "source_response": "The capital of France is Paris.",
  "shadow_response": "Paris is the capital of France.",
  "source_latency_ms": 450,
  "shadow_latency_ms": 380,
  "source_tokens": 312,
  "shadow_tokens": 295,
  "source_status_code": 200,
  "shadow_status_code": 200,
  "shadow_error": "",
  "prompt_hash": "a1b2c3d4",
  "created_at": "2026-03-25T10:30:00Z"
}

Field descriptions

FieldDescription
request_idUnique identifier for the original request
experiment_idIdentifier for this shadow experiment run
source_modelThe primary model that handled the user request
shadow_modelThe shadow model that processed the copy
source_responseThe response text from the primary model
shadow_responseThe response text from the shadow model
source_latency_msTime in milliseconds for the primary model to respond
shadow_latency_msTime in milliseconds for the shadow model to respond
source_tokensTotal tokens used by the primary model
shadow_tokensTotal tokens used by the shadow model
source_status_codeHTTP status code from the primary model
shadow_status_codeHTTP status code from the shadow model
shadow_errorError message if the shadow call failed (empty if successful)
prompt_hashHash of the prompt for deduplication and analysis
created_atTimestamp when the shadow result was created

Note

Shadow results appear in the Future AGI dashboard after periodic sync. Direct API access to results is not currently available.

Important notes

  • Non-streaming mirrors: Shadow copies are always sent as non-streaming requests, even if the original request was streaming
  • Billing: You are billed for shadow calls to the target provider at standard rates
  • Sample rate format: sample_rate is a float from 0.0 to 1.0 (not a percentage). Use 0.1 for 10%, 0.5 for 50%, 1.0 for 100%
  • Timeout: Shadow calls have a 30-second timeout. If the shadow model doesn’t respond within this window, the call is abandoned and an error is recorded
  • No user impact: Shadow failures never affect the primary response or user experience

Next steps

Was this page helpful?

Questions & Discussion