API Reference
Endpoints, request headers, and response headers for the Prism AI Gateway.
About
Prism uses the OpenAI-compatible API format. All requests go to https://gateway.futureagi.com and follow the same structure as OpenAI’s API. This page lists all supported endpoints, request headers, and response headers.
Supported Endpoints
| Endpoint | Description |
|---|---|
POST /v1/chat/completions | Chat completions (primary endpoint) |
POST /v1/completions | Legacy text completions |
POST /v1/embeddings | Text embeddings |
POST /v1/audio/transcriptions | Whisper speech-to-text |
POST /v1/audio/translations | Audio translation |
POST /v1/audio/speech | Text-to-speech |
POST /v1/audio/speech/stream | Streaming text-to-speech |
POST /v1/images/generations | Image generation |
POST /v1/rerank | Reranking |
GET /v1/models | List available models |
POST /v1/responses | OpenAI Responses API |
POST /v1/messages | Anthropic Messages API (native pass-through) |
POST /v1/count_tokens | Token counting |
/v1/files/* | File upload, list, retrieve, delete |
/v1/assistants/* | OpenAI Assistants API |
/v1/threads/* | Threads, Runs, and Steps API |
Request Headers
| Header | Description |
|---|---|
x-prism-session-id | Group requests into a logical session |
x-prism-metadata | Attach custom metadata as key=value pairs |
x-prism-trace-id | Set a custom trace ID for distributed tracing |
x-prism-cache-ttl | Override cache TTL for this request (e.g. 5m, 1h) |
x-prism-cache-force-refresh | Bypass cache and fetch a fresh response (true/false) |
Cache-Control: no-store | Disable caching for this request entirely |
Response Headers
Always present
| Header | Description |
|---|---|
X-Prism-Request-Id | Unique request identifier for log correlation |
X-Prism-Trace-Id | Trace ID for distributed tracing |
X-Prism-Latency-Ms | Total latency including the provider call |
X-Prism-Model-Used | Actual model used (may differ from requested if routing redirected) |
X-Prism-Provider | Provider that served the request |
X-Prism-Timeout-Ms | Timeout applied to this request |
Conditional
| Header | Present when |
|---|---|
X-Prism-Cost | Model has pricing data (absent on cache hits) |
X-Prism-Cache | Caching is enabled. Value is miss, hit, or skip |
X-Prism-Guardrail-Triggered | A guardrail policy triggered. Value is true |
X-Prism-Fallback-Used | A provider fallback occurred. Value is true |
X-Prism-Routing-Strategy | A routing policy is active, e.g. round-robin, weighted |
X-Ratelimit-Limit-Requests | Rate limiting is enabled. Ceiling per minute |
X-Ratelimit-Remaining-Requests | Requests remaining in current window |
X-Ratelimit-Reset-Requests | Unix timestamp when the rate limit resets |
Error Responses
Prism returns standard HTTP error codes with structured JSON error bodies.
Guardrail blocked (403)
When a guardrail is set to Enforce mode and triggers on a request, Prism returns 403 before the LLM is ever called:
{
"error": {
"type": "guardrail_triggered",
"code": "forbidden",
"message": "Request blocked by guardrail: pii-detector",
"guardrail": "pii-detector"
}
}
Budget exceeded (429)
When your organization’s spending limit is reached, new requests are blocked until the next billing period:
{
"error": {
"type": "budget_exceeded",
"code": "rate_limit_exceeded",
"message": "Organization monthly budget of $100.00 exceeded"
}
}
Provider unavailable (502)
When the selected provider is down or unreachable and no failover is configured:
{
"error": {
"type": "provider_error",
"code": "bad_gateway",
"message": "Provider openai returned 503: Service Unavailable"
}
}
Tip
To avoid provider failures affecting your users, configure routing with failover so Prism automatically retries with a backup provider.
Code examples
Vision (multimodal)
Send images alongside text using the image_url content type:
from prism import Prism
client = Prism(api_key="sk-prism-...", base_url="https://gateway.futureagi.com")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"},
},
],
}
],
)
print(response.choices[0].message.content)import { Prism } from "@futureagi/prism";
const client = new Prism({ apiKey: "sk-prism-...", baseUrl: "https://gateway.futureagi.com" });
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image_url", image_url: { url: "https://example.com/photo.jpg" } },
],
},
],
});
console.log(response.choices[0].message.content);curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}]
}' Function calling (tools)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
],
tool_choice="auto",
)
# Check if the model called a tool
if response.choices[0].finish_reason == "tool_calls":
tool_call = response.choices[0].message.tool_calls[0]
print(f"Tool: {tool_call.function.name}")
print(f"Args: {tool_call.function.arguments}")const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "What's the weather in Paris?" }],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City name" },
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["location"],
},
},
},
],
tool_choice: "auto",
});
if (response.choices[0].finish_reason === "tool_calls") {
const toolCall = response.choices[0].message.tool_calls![0];
console.log(`Tool: ${toolCall.function.name}`);
console.log(`Args: ${toolCall.function.arguments}`);
} Embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog",
)
vector = response.data[0].embedding
print(f"Embedding dimensions: {len(vector)}")const response = await client.embeddings.create({
model: "text-embedding-3-small",
input: "The quick brown fox jumps over the lazy dog",
});
const vector = response.data[0].embedding;
console.log(`Embedding dimensions: ${vector.length}`);curl -X POST https://gateway.futureagi.com/v1/embeddings \
-H "Authorization: Bearer sk-prism-..." \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}' Image generation
response = client.images.generate(
model="dall-e-3",
prompt="A futuristic city skyline at sunset, digital art",
n=1,
size="1024x1024",
)
print(response.data[0].url)const response = await client.images.generate({
model: "dall-e-3",
prompt: "A futuristic city skyline at sunset, digital art",
n: 1,
size: "1024x1024",
});
console.log(response.data[0].url);curl -X POST https://gateway.futureagi.com/v1/images/generations \
-H "Authorization: Bearer sk-prism-..." \
-H "Content-Type: application/json" \
-d '{
"model": "dall-e-3",
"prompt": "A futuristic city skyline at sunset, digital art",
"n": 1,
"size": "1024x1024"
}' Audio transcription
with open("audio.mp3", "rb") as f:
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=f,
)
print(transcription.text)curl -X POST https://gateway.futureagi.com/v1/audio/transcriptions \
-H "Authorization: Bearer sk-prism-..." \
-F file=@audio.mp3 \
-F model=whisper-1