API Reference

Endpoints, request headers, and response headers for the Prism AI Gateway.

About

Prism uses the OpenAI-compatible API format. All requests go to https://gateway.futureagi.com and follow the same structure as OpenAI’s API. This page lists all supported endpoints, request headers, and response headers.

Supported Endpoints

Endpoint	Description
`POST /v1/chat/completions`	Chat completions (primary endpoint)
`POST /v1/completions`	Legacy text completions
`POST /v1/embeddings`	Text embeddings
`POST /v1/audio/transcriptions`	Whisper speech-to-text
`POST /v1/audio/translations`	Audio translation
`POST /v1/audio/speech`	Text-to-speech
`POST /v1/audio/speech/stream`	Streaming text-to-speech
`POST /v1/images/generations`	Image generation
`POST /v1/rerank`	Reranking
`GET /v1/models`	List available models
`POST /v1/responses`	OpenAI Responses API
`POST /v1/messages`	Anthropic Messages API (native pass-through)
`POST /v1/count_tokens`	Token counting
`/v1/files/*`	File upload, list, retrieve, delete
`/v1/assistants/*`	OpenAI Assistants API
`/v1/threads/*`	Threads, Runs, and Steps API

Request Headers

Header	Description
`x-prism-session-id`	Group requests into a logical session
`x-prism-metadata`	Attach custom metadata as key=value pairs
`x-prism-trace-id`	Set a custom trace ID for distributed tracing
`x-prism-cache-ttl`	Override cache TTL for this request (e.g. 5m, 1h)
`x-prism-cache-force-refresh`	Bypass cache and fetch a fresh response (true/false)
`Cache-Control: no-store`	Disable caching for this request entirely

Response Headers

Always present

Header	Description
`X-Prism-Request-Id`	Unique request identifier for log correlation
`X-Prism-Trace-Id`	Trace ID for distributed tracing
`X-Prism-Latency-Ms`	Total latency including the provider call
`X-Prism-Model-Used`	Actual model used (may differ from requested if routing redirected)
`X-Prism-Provider`	Provider that served the request
`X-Prism-Timeout-Ms`	Timeout applied to this request

Conditional

Header	Present when
`X-Prism-Cost`	Model has pricing data (absent on cache hits)
`X-Prism-Cache`	Caching is enabled. Value is `miss`, `hit`, or `skip`
`X-Prism-Guardrail-Triggered`	A guardrail policy triggered. Value is `true`
`X-Prism-Fallback-Used`	A provider fallback occurred. Value is `true`
`X-Prism-Routing-Strategy`	A routing policy is active, e.g. `round-robin`, `weighted`
`X-Ratelimit-Limit-Requests`	Rate limiting is enabled. Ceiling per minute
`X-Ratelimit-Remaining-Requests`	Requests remaining in current window
`X-Ratelimit-Reset-Requests`	Unix timestamp when the rate limit resets

Error Responses

Prism returns standard HTTP error codes with structured JSON error bodies.

Guardrail blocked (403)

When a guardrail is set to Enforce mode and triggers on a request, Prism returns 403 before the LLM is ever called:

{
  "error": {
    "type": "guardrail_triggered",
    "code": "forbidden",
    "message": "Request blocked by guardrail: pii-detector",
    "guardrail": "pii-detector"
  }
}

Budget exceeded (429)

When your organization’s spending limit is reached, new requests are blocked until the next billing period:

{
  "error": {
    "type": "budget_exceeded",
    "code": "rate_limit_exceeded",
    "message": "Organization monthly budget of $100.00 exceeded"
  }
}

Provider unavailable (502)

When the selected provider is down or unreachable and no failover is configured:

{
  "error": {
    "type": "provider_error",
    "code": "bad_gateway",
    "message": "Provider openai returned 503: Service Unavailable"
  }
}

Tip

To avoid provider failures affecting your users, configure routing with failover so Prism automatically retries with a backup provider.

Code examples

Vision (multimodal)

Send images alongside text using the image_url content type:

from prism import Prism

client = Prism(api_key="sk-prism-...", base_url="https://gateway.futureagi.com")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"},
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

import { Prism } from "@futureagi/prism";

const client = new Prism({ apiKey: "sk-prism-...", baseUrl: "https://gateway.futureagi.com" });

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image_url", image_url: { url: "https://example.com/photo.jpg" } },
      ],
    },
  ],
});
console.log(response.choices[0].message.content);

curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }]
  }'

Function calling (tools)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ],
    tool_choice="auto",
)

# Check if the model called a tool
if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Tool: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get the current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string", description: "City name" },
            unit: { type: "string", enum: ["celsius", "fahrenheit"] },
          },
          required: ["location"],
        },
      },
    },
  ],
  tool_choice: "auto",
});

if (response.choices[0].finish_reason === "tool_calls") {
  const toolCall = response.choices[0].message.tool_calls![0];
  console.log(`Tool: ${toolCall.function.name}`);
  console.log(`Args: ${toolCall.function.arguments}`);
}

Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog",
)
vector = response.data[0].embedding
print(f"Embedding dimensions: {len(vector)}")

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "The quick brown fox jumps over the lazy dog",
});
const vector = response.data[0].embedding;
console.log(`Embedding dimensions: ${vector.length}`);

curl -X POST https://gateway.futureagi.com/v1/embeddings \
  -H "Authorization: Bearer sk-prism-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Image generation

response = client.images.generate(
    model="dall-e-3",
    prompt="A futuristic city skyline at sunset, digital art",
    n=1,
    size="1024x1024",
)
print(response.data[0].url)

const response = await client.images.generate({
  model: "dall-e-3",
  prompt: "A futuristic city skyline at sunset, digital art",
  n: 1,
  size: "1024x1024",
});
console.log(response.data[0].url);

curl -X POST https://gateway.futureagi.com/v1/images/generations \
  -H "Authorization: Bearer sk-prism-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A futuristic city skyline at sunset, digital art",
    "n": 1,
    "size": "1024x1024"
  }'

Audio transcription

with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
    )
print(transcription.text)

curl -X POST https://gateway.futureagi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-prism-..." \
  -F file=@audio.mp3 \
  -F model=whisper-1

Next steps

Quickstart

Get up and running with Prism in five minutes

Core Concepts

Learn the fundamental concepts behind Prism

Was this page helpful?

FutureAGI AI Assistant