Routing & Reliability

Configure load balancing, failover, retries, and circuit breaking across LLM providers.

What it is

Prism’s routing layer distributes requests across multiple providers and models to maximize reliability and performance. If one provider is down or slow, traffic automatically shifts to healthy alternatives. This ensures your application stays responsive even when individual providers experience outages or rate limiting.


Use cases

  • High availability — Automatic failover to backup providers when primary is down or rate-limited
  • Cost optimization — Route to the cheapest provider that supports the requested model
  • Latency reduction — Route to the fastest provider based on recent response times
  • Traffic distribution — Split traffic across providers by weight for capacity management

Key concepts

TermDefinition
FailoverAutomatic rerouting of requests to a backup provider when the primary provider fails or returns errors (429, 5xx)
RetriesRepeated attempts to send a request after a failure, using exponential backoff to avoid overwhelming the provider
Circuit breakingA protection mechanism that stops sending requests to a failing provider entirely, then gradually tests recovery before resuming full traffic
TimeoutsMaximum duration Prism waits for a provider response before treating the request as failed
Routing strategyThe algorithm Prism uses to select which provider handles each request (e.g., round robin, weighted, latency-based)

Configuration parameters

These parameters appear in the JSON configuration blocks throughout this page.

Failover:

ParameterTypeDescription
enabledbooleanTurn failover on or off
providersstring[]Ordered list of providers to try when one fails
failover_onnumber[]HTTP status codes that trigger failover (e.g., 429, 500, 502, 503, 504)

Retries:

ParameterTypeDefaultDescription
max_retriesnumber3Maximum number of retry attempts before giving up
initial_backoff_msnumber100Wait time (ms) before the first retry
max_backoff_msnumber10000Upper limit on wait time between retries
backoff_multipliernumber2Multiplier applied to backoff after each retry (e.g., 100ms → 200ms → 400ms)

Circuit breaker:

ParameterTypeDescription
enabledbooleanTurn circuit breaking on or off
error_threshold_percentnumberError rate (%) that trips the circuit open
min_requestsnumberMinimum request count before the error threshold is evaluated
open_duration_secondsnumberHow long (seconds) the circuit stays open before testing recovery
half_open_max_requestsnumberNumber of trial requests allowed during the half-open recovery test

Timeouts:

ParameterTypeDescription
request_timeout_secondsnumberMaximum total time for the entire request (including retries and failovers)
provider_timeout_secondsnumberMaximum time to wait for a single provider response

Routing strategies

StrategyHow it works
Round RobinEvenly across providers in rotation
WeightedBased on assigned weights (e.g., 70% OpenAI, 30% Anthropic)
Latency-basedRoutes to fastest provider based on recent response times
Cost-optimizedCheapest provider that supports the requested model
AdaptiveDynamically adjusts weights based on real-time performance
FastestSends to all providers simultaneously, returns the first response — note that you are billed for every call made, including those whose responses are discarded

Configuring a routing policy

Routing dashboard

  1. Open Prism dashboard at https://app.futureagi.com/dashboard/gateway/routing
  2. Navigate to Routing
  3. Click Create Policy
  4. Enter name and optional description
  5. Select strategy
  6. Configure strategy-specific settings
  7. Click Save
from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    control_plane_url="https://api.futureagi.com",
)

# Create a weighted routing policy
policy = client.routing.create(
    name="Production routing",
    strategy="weighted",
    config={"weights": {"openai": 70, "anthropic": 30}},
    description="70/30 split between OpenAI and Anthropic",
)

# List all routing policies
policies = client.routing.list()

# Update an existing policy
client.routing.update(
    policy["id"],
    strategy="least-latency",
    config={"providers": ["openai", "anthropic", "gemini"], "failover_on": [429, 500, 502, 503, 504]},
)
import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-your-key",
  baseUrl: "https://gateway.futureagi.com",
  controlPlaneUrl: "https://api.futureagi.com",
});

const policy = await client.routing.create({
  name: "Production routing",
  strategy: "weighted",
  config: { weights: { openai: 70, anthropic: 30 } },
  description: "70/30 split between OpenAI and Anthropic",
});

const policies = await client.routing.list();

await client.routing.update(policy.id, {
  strategy: "least-latency",
  config: { providers: ["openai", "anthropic", "gemini"], failoverOn: [429, 500, 502, 503, 504] },
});

Failover

Failover triggers on specific HTTP status codes and error conditions: 429 (rate limit), 5xx (server errors), timeouts, and connection errors. The providers array defines the failover order. When the primary provider fails, Prism automatically routes to the next provider in the list.

{
  "failover": {
    "enabled": true,
    "providers": ["openai", "anthropic", "gemini"],
    "failover_on": [429, 500, 502, 503, 504]
  }
}

Note

The providers array defines the failover order. Prism will attempt each provider in sequence until one succeeds.


Retries

Prism uses exponential backoff for retries. This means it waits progressively longer between each retry attempt — for example, 100ms, then 200ms, then 400ms. This gives struggling providers time to recover instead of flooding them with rapid retry requests.

SettingDescriptionDefault
max_retriesMaximum number of retry attempts3
initial_backoff_msInitial backoff duration in milliseconds100
max_backoff_msMaximum backoff duration in milliseconds10000
backoff_multiplierMultiplier for exponential backoff2
{
  "retries": {
    "max_retries": 3,
    "initial_backoff_ms": 100,
    "max_backoff_ms": 10000,
    "backoff_multiplier": 2
  }
}

Circuit breaking

Think of a circuit breaker like a fuse box. When a provider starts failing repeatedly, the circuit “trips” — Prism stops sending requests to that provider entirely and routes to healthy alternatives instead. After a recovery window, Prism tests the provider with a few trial requests. If those succeed, the circuit “closes” and normal routing resumes. This prevents a single failing provider from degrading your entire application.

Circuit breaking prevents cascading failures by stopping requests to a provider that is experiencing issues. The circuit breaker has three states: Closed (normal operation), Open (rejecting requests), and Half-Open (testing recovery).

StateBehavior
ClosedNormal operation, requests pass through
OpenRequests rejected immediately, no calls to provider
Half-OpenLimited requests allowed to test if provider recovered
{
  "circuit_breaker": {
    "enabled": true,
    "error_threshold_percent": 50,
    "min_requests": 10,
    "open_duration_seconds": 60,
    "half_open_max_requests": 3
  }
}

Tip

Circuit breaking works seamlessly with failover. When a circuit opens, Prism automatically routes to the next available provider.


Timeouts

Configure per-request and per-provider timeouts to prevent hanging requests.

{
  "timeouts": {
    "request_timeout_seconds": 30,
    "provider_timeout_seconds": 25
  }
}

Example: High-availability setup

This configuration combines weighted routing, failover, retries, and circuit breaking for a production-grade setup:

{
  "name": "Production HA",
  "strategy": "weighted",
  "config": {
    "weights": {
      "openai": 60,
      "anthropic": 30,
      "gemini": 10
    },
    "failover": {
      "enabled": true,
      "providers": ["openai", "anthropic", "gemini"],
      "failover_on": [429, 500, 502, 503, 504]
    },
    "retries": {
      "max_retries": 3,
      "initial_backoff_ms": 100,
      "max_backoff_ms": 10000,
      "backoff_multiplier": 2
    },
    "circuit_breaker": {
      "enabled": true,
      "error_threshold_percent": 50,
      "min_requests": 10,
      "open_duration_seconds": 60,
      "half_open_max_requests": 3
    },
    "timeouts": {
      "request_timeout_seconds": 30,
      "provider_timeout_seconds": 25
    }
  }
}

What you can do next

Was this page helpful?

Questions & Discussion