Assistants API
Use the OpenAI Assistants API through Prism for managed conversations with tool use and file retrieval.
About
Prism fully proxies the OpenAI Assistants API. Create assistants with instructions and tools, manage conversation threads, and execute runs - all through the gateway. You get the same Assistants API you’d use with OpenAI directly, plus Prism’s routing, cost tracking, rate limiting, and logging on every call.
The Assistants API is stateful (OpenAI stores threads and messages server-side), so it only works with OpenAI as the provider. Use the OpenAI SDK pointed at Prism.
Warning
Routing and failover do not apply to the Assistants API. Threads and runs are stored on OpenAI’s servers, so the assistant’s model must be an OpenAI model.
Endpoints
Assistants
| Method | Path | Description |
|---|---|---|
| POST | /v1/assistants | Create an assistant |
| GET | /v1/assistants | List assistants |
| GET | /v1/assistants/{id} | Get an assistant |
| POST | /v1/assistants/{id} | Update an assistant |
| DELETE | /v1/assistants/{id} | Delete an assistant |
Threads
| Method | Path | Description |
|---|---|---|
| POST | /v1/threads | Create a thread |
| GET | /v1/threads/{id} | Get a thread |
| POST | /v1/threads/{id} | Update a thread |
| DELETE | /v1/threads/{id} | Delete a thread |
Messages
| Method | Path | Description |
|---|---|---|
| POST | /v1/threads/{id}/messages | Add a message to a thread |
| GET | /v1/threads/{id}/messages | List messages in a thread |
| GET | /v1/threads/{id}/messages/{msg_id} | Get a message |
| POST | /v1/threads/{id}/messages/{msg_id} | Update a message |
| DELETE | /v1/threads/{id}/messages/{msg_id} | Delete a message |
Runs
| Method | Path | Description |
|---|---|---|
| POST | /v1/threads/{id}/runs | Create a run |
| GET | /v1/threads/{id}/runs | List runs |
| GET | /v1/threads/{id}/runs/{run_id} | Get a run |
| POST | /v1/threads/{id}/runs/{run_id} | Update a run |
| POST | /v1/threads/{id}/runs/{run_id}/cancel | Cancel a run |
| POST | /v1/threads/{id}/runs/{run_id}/submit_tool_outputs | Submit tool outputs |
| GET | /v1/threads/{id}/runs/{run_id}/steps | List run steps |
| POST | /v1/threads/runs | Create thread and run in one call |
Quick example
Create an assistant, start a conversation, and get a response:
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.futureagi.com/v1",
api_key="sk-prism-your-key",
)
# 1. Create an assistant
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a math tutor. Explain concepts step by step.",
model="gpt-4o",
)
# 2. Create a thread
thread = client.beta.threads.create()
# 3. Add a message
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Explain the Pythagorean theorem",
)
# 4. Run the assistant
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id,
)
# 5. Get the response
if run.status == "completed":
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
if msg.role == "assistant":
print(msg.content[0].text.value)
break # 1. Create an assistant
ASSISTANT_ID=$(curl -s -X POST https://gateway.futureagi.com/v1/assistants \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-H "OpenAI-Beta: assistants=v2" \
-d '{
"name": "Math Tutor",
"instructions": "You are a math tutor. Explain concepts step by step.",
"model": "gpt-4o"
}' | jq -r '.id')
# 2. Create a thread
THREAD_ID=$(curl -s -X POST https://gateway.futureagi.com/v1/threads \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-H "OpenAI-Beta: assistants=v2" \
-d '{}' | jq -r '.id')
# 3. Add a message
curl -s -X POST "https://gateway.futureagi.com/v1/threads/$THREAD_ID/messages" \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-H "OpenAI-Beta: assistants=v2" \
-d '{"role": "user", "content": "Explain the Pythagorean theorem"}'
# 4. Create a run
RUN_ID=$(curl -s -X POST "https://gateway.futureagi.com/v1/threads/$THREAD_ID/runs" \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-H "OpenAI-Beta: assistants=v2" \
-d "{\"assistant_id\": \"$ASSISTANT_ID\"}" | jq -r '.id')
# 5. Poll until complete, then get messages
# (poll GET /v1/threads/$THREAD_ID/runs/$RUN_ID until status is "completed")
curl -s "https://gateway.futureagi.com/v1/threads/$THREAD_ID/messages" \
-H "Authorization: Bearer sk-prism-your-key" \
-H "OpenAI-Beta: assistants=v2" | jq '.data[0].content[0].text.value' Tool use
Assistants can call tools (functions you define) during a run. When the run enters requires_action status, you submit tool outputs to continue.
import json
# Create assistant with tools
assistant = client.beta.assistants.create(
name="Weather Bot",
instructions="You help users check the weather.",
model="gpt-4o",
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
}],
)
thread = client.beta.threads.create()
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What's the weather in Tokyo?",
)
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
# Poll until the run needs action or completes
import time
while run.status in ("queued", "in_progress"):
time.sleep(1)
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id,
)
if run.status == "requires_action":
tool_calls = run.required_action.submit_tool_outputs.tool_calls
# Process each tool call
tool_outputs = []
for call in tool_calls:
args = json.loads(call.function.arguments)
# Your actual function call here
result = f"22°C and sunny in {args['city']}"
tool_outputs.append({
"tool_call_id": call.id,
"output": result,
})
# Submit outputs and wait for completion
run = client.beta.threads.runs.submit_tool_outputs_and_poll(
thread_id=thread.id,
run_id=run.id,
tool_outputs=tool_outputs,
)
if run.status == "completed":
messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)
File search
Assistants can search uploaded files using vector stores. Upload files, attach them to a vector store, then give the assistant access:
# Upload a file
file = client.files.create(
file=open("knowledge_base.pdf", "rb"),
purpose="assistants",
)
# Create a vector store and add the file
vector_store = client.beta.vector_stores.create(name="Knowledge Base")
client.beta.vector_stores.files.create(
vector_store_id=vector_store.id,
file_id=file.id,
)
# Create assistant with file search
assistant = client.beta.assistants.create(
name="Research Assistant",
instructions="Answer questions using the provided documents.",
model="gpt-4o",
tools=[{"type": "file_search"}],
tool_resources={
"file_search": {
"vector_store_ids": [vector_store.id],
}
},
)
# Ask a question about the uploaded file
thread = client.beta.threads.create()
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What does the document say about quarterly revenue?",
)
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id,
)
if run.status == "completed":
messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)
Streaming runs
Stream run events for real-time UI updates instead of polling:
from openai import AssistantEventHandler
class MyHandler(AssistantEventHandler):
def on_text_created(self, text):
print("\nassistant > ", end="", flush=True)
def on_text_delta(self, delta, snapshot):
print(delta.value, end="", flush=True)
def on_tool_call_created(self, tool_call):
print(f"\n Tool call: {tool_call.type}", flush=True)
# Using thread and assistant from earlier examples
with client.beta.threads.runs.stream(
thread_id=thread.id, # from the thread you created
assistant_id=assistant.id, # from the assistant you created
event_handler=MyHandler(),
) as stream:
stream.until_done()
What Prism adds
Since Prism proxies every Assistants API call, you get:
- Cost tracking: Every run, message creation, and retrieval call is logged with cost in the
x-prism-costheader - Rate limiting: Per-key and per-org limits apply to all Assistants API calls
- Logging: Full request/response logging for debugging and compliance
- Access control: Virtual key restrictions (allowed models, IP ACL) apply to the assistant’s model
The x-prism-* response headers are returned on every Assistants API response, just like any other Prism endpoint.