Async & batch

Run inference jobs asynchronously or process large batches of requests through the Prism Gateway.

About

Prism supports two modes for deferred processing: async inference sends a single request and returns a job ID you poll for the result, and batch processing submits many requests at once for bulk execution at lower cost.

Both modes support all the same models and parameters as synchronous chat completions.

Endpoints

Method	Path	Description
GET	`/v1/async/{job_id}`	Get async job status and result
DELETE	`/v1/async/{job_id}`	Cancel an async job
POST	`/v1/scheduled`	Schedule a completion for later
GET	`/v1/scheduled`	List scheduled jobs
GET	`/v1/scheduled/{job_id}`	Get a scheduled job
DELETE	`/v1/scheduled/{job_id}`	Cancel a scheduled job

Async inference

Send a chat completion request with async mode enabled. The gateway returns immediately with a job ID. Poll the job endpoint to get the result when it’s ready.

Sending an async request

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-key",
)

# Send async request with x-prism-async header
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a detailed essay about climate change"}],
    extra_headers={"x-prism-async": "true"},
)

# Response contains the job ID
job_id = response.id
print(f"Job ID: {job_id}")

curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -H "x-prism-async: true" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a detailed essay about climate change"}]
  }'

Polling for results

import time
import requests

headers = {"Authorization": "Bearer sk-prism-your-key"}

while True:
    resp = requests.get(
        f"https://gateway.futureagi.com/v1/async/{job_id}",
        headers=headers,
    )
    data = resp.json()

    if data["status"] == "completed":
        print(data["result"]["choices"][0]["message"]["content"])
        break
    elif data["status"] == "failed":
        print(f"Job failed: {data.get('error')}")
        break
    else:
        time.sleep(2)

Job statuses

Status	Description
`pending`	Job is queued
`running`	Job is being processed
`completed`	Result is ready
`failed`	Job failed (check `error` field)
`cancelled`	Job was cancelled

Scheduled completions

Schedule a request to run at a specific time. Useful for time-sensitive content generation or deferred workloads.

curl -X POST https://gateway.futureagi.com/v1/scheduled \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "scheduled_at": "2026-04-05T09:00:00Z",
    "request": {
      "model": "gpt-4o",
      "messages": [{"role": "user", "content": "Generate the daily summary report"}]
    }
  }'

Managing scheduled jobs

# List scheduled jobs
curl https://gateway.futureagi.com/v1/scheduled \
  -H "Authorization: Bearer sk-prism-your-key"

# Get a specific job
curl https://gateway.futureagi.com/v1/scheduled/job_123 \
  -H "Authorization: Bearer sk-prism-your-key"

# Cancel a scheduled job
curl -X DELETE https://gateway.futureagi.com/v1/scheduled/job_123 \
  -H "Authorization: Bearer sk-prism-your-key"

Batch processing

For high-volume workloads, the OpenAI Batch API lets you submit a file of requests and retrieve results when processing is complete. Batch requests typically run at lower cost (50% discount with OpenAI).

Creating a batch

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-key",
)

# 1. Create a JSONL file with requests
requests_data = [
    {
        "custom_id": "req-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": "Summarize: Machine learning is..."}],
        },
    },
    {
        "custom_id": "req-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": "Summarize: Neural networks are..."}],
        },
    },
]

with open("batch_input.jsonl", "w") as f:
    for req in requests_data:
        f.write(json.dumps(req) + "\n")

# 2. Upload the input file
input_file = client.files.create(
    file=open("batch_input.jsonl", "rb"),
    purpose="batch",
)

# 3. Create the batch
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)
print(f"Batch ID: {batch.id}, Status: {batch.status}")

Checking batch status

import time

while True:
    batch = client.batches.retrieve(batch.id)
    print(f"Status: {batch.status} ({batch.request_counts.completed}/{batch.request_counts.total})")

    if batch.status == "completed":
        break
    elif batch.status in ("failed", "cancelled", "expired"):
        print(f"Batch ended: {batch.status}")
        break

    time.sleep(30)

Retrieving results

if batch.output_file_id:
    content = client.files.content(batch.output_file_id)
    results = content.text.strip().split("\n")

    for line in results:
        result = json.loads(line)
        print(f"{result['custom_id']}: {result['response']['body']['choices'][0]['message']['content'][:100]}")

When to use each mode

Mode	Best for	Latency	Cost
Synchronous	Interactive apps, real-time responses	Lowest	Standard
Async	Long-running requests, fire-and-forget	Medium (poll)	Standard
Scheduled	Time-triggered jobs, deferred work	Scheduled	Standard
Batch	High-volume processing, data pipelines	Hours	Discounted (up to 50% off)

Async & batch

About

Endpoints

Async inference

Sending an async request

Polling for results

Job statuses

Scheduled completions

Managing scheduled jobs

Batch processing

Creating a batch

Checking batch status

Retrieving results

When to use each mode

Next Steps

Chat completions

Cost tracking

Rate limiting

Endpoints overview

Questions & Discussion

FutureAGI AI Assistant

About

Endpoints

Async inference

Sending an async request

Polling for results

Job statuses

Scheduled completions

Managing scheduled jobs

Batch processing

Creating a batch

Checking batch status

Retrieving results

When to use each mode

Next Steps

Chat completions

Cost tracking

Rate limiting

Endpoints overview

Questions & Discussion