Shadow Experiments
Mirror a percentage of production LLM traffic to alternative models for zero-risk evaluation.
About
Shadow experiments let you silently copy a percentage of production LLM requests to a second model without affecting the user-facing response. Your primary model handles the request normally and returns the response to the user. Simultaneously, a background process sends a copy of the same request to a shadow model for evaluation.
This approach gives you real production data for model comparison, cost analysis, and provider migration testing, all without any impact on user experience. Results are collected and synced to the Future AGI dashboard for analysis.
When to use
- Model evaluation: Test a new model on real production traffic before switching
- Cost comparison: Compare pricing and token usage between models without affecting users
- Provider migration: Validate a provider switch (e.g., OpenAI to Anthropic) on a fraction of traffic
- Prompt validation: Test prompt changes in production before full rollout
- Latency analysis: Compare response times between models under real load
How it works
When you enable shadow experiments:
- A request arrives at the gateway for your primary model
- The primary model processes the request and returns the response to the user immediately
- Simultaneously, a background goroutine sends a copy of the request to the shadow model
- The shadow model’s response, latency, token count, and status code are captured
- Results are collected and periodically synced to the Future AGI dashboard
The user never waits for the shadow model. If the shadow call fails or times out, it doesn’t affect the primary response.
Configuring via SDK (per-request)
You can enable shadow experiments on a per-request basis by passing a GatewayConfig with TrafficMirrorConfig to the Prism client.
from prism import Prism, GatewayConfig, TrafficMirrorConfig
client = Prism(
api_key="sk-prism-...",
base_url="https://gateway.futureagi.com",
config=GatewayConfig(
mirror=TrafficMirrorConfig(
target_model="claude-sonnet-4-20250514",
target_provider="anthropic",
sample_rate=0.1, # Mirror 10% of traffic
)
),
)
# Normal request — 10% of traffic is silently mirrored to Claude
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize the latest AI news."}],
)
print(response.choices[0].message.content)import { Prism } from "@futureagi/prism";
const client = new Prism({
apiKey: "sk-prism-...",
baseUrl: "https://gateway.futureagi.com",
config: {
mirror: {
target_model: "claude-sonnet-4-20250514",
target_provider: "anthropic",
sample_rate: 0.1, // Mirror 10% of traffic
},
},
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Summarize the latest AI news." }],
});
console.log(response.choices[0].message.content); Configuration options
target_model: The model to mirror traffic to (e.g.,"claude-sonnet-4-20250514")target_provider: The provider of the shadow model (e.g.,"anthropic","openai")sample_rate: Float between 0.0 and 1.0.0.1mirrors 10% of traffic,1.0mirrors 100%enabled: Set tofalseto disable mirroring (defaults totrue)
Configuring in config.yaml (gateway-level)
For persistent gateway-level configuration, add a routing.mirror section to your config.yaml:
routing:
mirror:
enabled: true
rules:
- source_model: "gpt-4o"
target_provider: "anthropic"
target_model: "claude-sonnet-4-20250514"
sample_rate: 0.1 # Mirror 10% of gpt-4o traffic
- source_model: "gpt-4-turbo"
target_provider: "anthropic"
target_model: "claude-opus-4-20250514"
sample_rate: 0.05 # Mirror 5% of gpt-4-turbo traffic
- source_model: "*" # Wildcard: mirror ALL models
target_provider: "staging"
sample_rate: 0.01 # 1% of all traffic
Use "*" as the source_model to mirror all requests regardless of the primary model. Rules are evaluated in order, so place more specific rules before wildcard rules.
Collected data
Each mirrored request produces a shadow result with the following fields:
{
"request_id": "req_abc123",
"experiment_id": "exp_xyz",
"source_model": "gpt-4o",
"shadow_model": "claude-sonnet-4-20250514",
"source_response": "The capital of France is Paris.",
"shadow_response": "Paris is the capital of France.",
"source_latency_ms": 450,
"shadow_latency_ms": 380,
"source_tokens": 312,
"shadow_tokens": 295,
"source_status_code": 200,
"shadow_status_code": 200,
"shadow_error": "",
"prompt_hash": "a1b2c3d4",
"created_at": "2026-03-25T10:30:00Z"
}
Field descriptions
| Field | Description |
|---|---|
request_id | Unique identifier for the original request |
experiment_id | Identifier for this shadow experiment run |
source_model | The primary model that handled the user request |
shadow_model | The shadow model that processed the copy |
source_response | The response text from the primary model |
shadow_response | The response text from the shadow model |
source_latency_ms | Time in milliseconds for the primary model to respond |
shadow_latency_ms | Time in milliseconds for the shadow model to respond |
source_tokens | Total tokens used by the primary model |
shadow_tokens | Total tokens used by the shadow model |
source_status_code | HTTP status code from the primary model |
shadow_status_code | HTTP status code from the shadow model |
shadow_error | Error message if the shadow call failed (empty if successful) |
prompt_hash | Hash of the prompt for deduplication and analysis |
created_at | Timestamp when the shadow result was created |
Note
Shadow results appear in the Future AGI dashboard after periodic sync. Direct API access to results is not currently available.
Important notes
- Non-streaming mirrors: Shadow copies are always sent as non-streaming requests, even if the original request was streaming
- Billing: You are billed for shadow calls to the target provider at standard rates
- Sample rate format:
sample_rateis a float from 0.0 to 1.0 (not a percentage). Use0.1for 10%,0.5for 50%,1.0for 100% - Timeout: Shadow calls have a 30-second timeout. If the shadow model doesn’t respond within this window, the call is abandoned and an error is recorded
- No user impact: Shadow failures never affect the primary response or user experience