API Reference

Endpoints, request headers, and response headers for the Prism AI Gateway.

About

Prism uses the OpenAI-compatible API format. All requests go to https://gateway.futureagi.com and follow the same structure as OpenAI’s API. This page lists all supported endpoints, request headers, and response headers.


Supported Endpoints

EndpointDescription
POST /v1/chat/completionsChat completions (primary endpoint)
POST /v1/completionsLegacy text completions
POST /v1/embeddingsText embeddings
POST /v1/audio/transcriptionsWhisper speech-to-text
POST /v1/audio/translationsAudio translation
POST /v1/audio/speechText-to-speech
POST /v1/audio/speech/streamStreaming text-to-speech
POST /v1/images/generationsImage generation
POST /v1/rerankReranking
GET /v1/modelsList available models
POST /v1/responsesOpenAI Responses API
POST /v1/messagesAnthropic Messages API (native pass-through)
POST /v1/count_tokensToken counting
/v1/files/*File upload, list, retrieve, delete
/v1/assistants/*OpenAI Assistants API
/v1/threads/*Threads, Runs, and Steps API

Request Headers

HeaderDescription
x-prism-session-idGroup requests into a logical session
x-prism-metadataAttach custom metadata as key=value pairs
x-prism-trace-idSet a custom trace ID for distributed tracing
x-prism-cache-ttlOverride cache TTL for this request (e.g. 5m, 1h)
x-prism-cache-force-refreshBypass cache and fetch a fresh response (true/false)
Cache-Control: no-storeDisable caching for this request entirely

Response Headers

Always present

HeaderDescription
X-Prism-Request-IdUnique request identifier for log correlation
X-Prism-Trace-IdTrace ID for distributed tracing
X-Prism-Latency-MsTotal latency including the provider call
X-Prism-Model-UsedActual model used (may differ from requested if routing redirected)
X-Prism-ProviderProvider that served the request
X-Prism-Timeout-MsTimeout applied to this request

Conditional

HeaderPresent when
X-Prism-CostModel has pricing data (absent on cache hits)
X-Prism-CacheCaching is enabled. Value is miss, hit, or skip
X-Prism-Guardrail-TriggeredA guardrail policy triggered. Value is true
X-Prism-Fallback-UsedA provider fallback occurred. Value is true
X-Prism-Routing-StrategyA routing policy is active, e.g. round-robin, weighted
X-Ratelimit-Limit-RequestsRate limiting is enabled. Ceiling per minute
X-Ratelimit-Remaining-RequestsRequests remaining in current window
X-Ratelimit-Reset-RequestsUnix timestamp when the rate limit resets

Error Responses

Prism returns standard HTTP error codes with structured JSON error bodies.

Guardrail blocked (403)

When a guardrail is set to Enforce mode and triggers on a request, Prism returns 403 before the LLM is ever called:

{
  "error": {
    "type": "guardrail_triggered",
    "code": "forbidden",
    "message": "Request blocked by guardrail: pii-detector",
    "guardrail": "pii-detector"
  }
}

Budget exceeded (429)

When your organization’s spending limit is reached, new requests are blocked until the next billing period:

{
  "error": {
    "type": "budget_exceeded",
    "code": "rate_limit_exceeded",
    "message": "Organization monthly budget of $100.00 exceeded"
  }
}

Provider unavailable (502)

When the selected provider is down or unreachable and no failover is configured:

{
  "error": {
    "type": "provider_error",
    "code": "bad_gateway",
    "message": "Provider openai returned 503: Service Unavailable"
  }
}

Tip

To avoid provider failures affecting your users, configure routing with failover so Prism automatically retries with a backup provider.


Code examples

Vision (multimodal)

Send images alongside text using the image_url content type:

from prism import Prism

client = Prism(api_key="sk-prism-...", base_url="https://gateway.futureagi.com")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"},
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)
import { Prism } from "@futureagi/prism";

const client = new Prism({ apiKey: "sk-prism-...", baseUrl: "https://gateway.futureagi.com" });

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image_url", image_url: { url: "https://example.com/photo.jpg" } },
      ],
    },
  ],
});
console.log(response.choices[0].message.content);
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }]
  }'

Function calling (tools)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ],
    tool_choice="auto",
)

# Check if the model called a tool
if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Tool: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get the current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string", description: "City name" },
            unit: { type: "string", enum: ["celsius", "fahrenheit"] },
          },
          required: ["location"],
        },
      },
    },
  ],
  tool_choice: "auto",
});

if (response.choices[0].finish_reason === "tool_calls") {
  const toolCall = response.choices[0].message.tool_calls![0];
  console.log(`Tool: ${toolCall.function.name}`);
  console.log(`Args: ${toolCall.function.arguments}`);
}

Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog",
)
vector = response.data[0].embedding
print(f"Embedding dimensions: {len(vector)}")
const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "The quick brown fox jumps over the lazy dog",
});
const vector = response.data[0].embedding;
console.log(`Embedding dimensions: ${vector.length}`);
curl -X POST https://gateway.futureagi.com/v1/embeddings \
  -H "Authorization: Bearer sk-prism-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Image generation

response = client.images.generate(
    model="dall-e-3",
    prompt="A futuristic city skyline at sunset, digital art",
    n=1,
    size="1024x1024",
)
print(response.data[0].url)
const response = await client.images.generate({
  model: "dall-e-3",
  prompt: "A futuristic city skyline at sunset, digital art",
  n: 1,
  size: "1024x1024",
});
console.log(response.data[0].url);
curl -X POST https://gateway.futureagi.com/v1/images/generations \
  -H "Authorization: Bearer sk-prism-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A futuristic city skyline at sunset, digital art",
    "n": 1,
    "size": "1024x1024"
  }'

Audio transcription

with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
    )
print(transcription.text)
curl -X POST https://gateway.futureagi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-prism-..." \
  -F file=@audio.mp3 \
  -F model=whisper-1

Next steps

Was this page helpful?

Questions & Discussion