Linked Traces: Monitor Prompt Performance in Production

Associate prompts with production traces to monitor latency, token usage, and cost per prompt version in the Prompt Workbench.

About

Every time your application sends a prompt to a model, Future AGI records it as a trace: the inputs, outputs, latency, tokens used, and cost. On their own, those traces tell you how your application is performing. Linked traces connect each trace back to the specific prompt and version that produced it.

Once linked, the Prompt Workbench shows aggregated metrics per prompt version alongside the prompt itself. Instead of searching through individual traces, you see a consolidated view: how many times a prompt was called, its typical latency and cost, and how those metrics shift as you iterate.

When to use

Validating a prompt change in production: Compare latency and cost between versions on real traffic, not just test runs.
Diagnosing a cost spike: Metrics per prompt version show exactly which prompt or version is driving spend.
Comparing active versions: See real-world performance across prompt versions side by side to decide which to keep.
Auditing prompt usage: Trace count shows which prompts are actively being called and which are stale or abandoned.

Linked Traces vs Raw Traces

	Raw traces	Linked traces
What you see	Application-level metrics	Metrics per prompt and version
Attribution	Anonymous API calls	Tied to a specific template and version
Where to view	Observe / tracing dashboard	Prompt Workbench Metrics tab
Setup required	SDK instrumentation	SDK instrumentation + template reference in request

How to

To link prompts to traces, you need to associate the prompt used in a generation with the corresponding trace. The process is described in the observability and manual tracing docs: Log prompt templates. Once your application sends traces that include the prompt template (or template ID), Future AGI links those traces to the prompt in the Prompt Workbench.

Metrics and Analytics

After linking, open your prompt in the dashboard and go to the Metrics tab.

Metric	What it tells you
Median Latency	Typical time for the model to produce a response. Lower is better for responsiveness; use it to spot slow prompts or model changes.
Median Input Tokens	Typical size of the prompt sent to the model. Helps you see verbosity and compare input length across versions.
Median Output Tokens	Typical length of the model’s reply. Useful for cost and length control; compare after changing instructions or max tokens.
Median Costs	Typical cost per generation for this prompt. Use it to compare cost across prompt versions or models.
Traces Count	How many times this prompt was used in the selected period. Shows which prompts are active and where to focus optimization.
First and Last Generation	When the prompt was first and last used. Confirms the time range of the data you’re viewing.

Compare the same metric across prompt versions or time ranges to see if a change improved latency, cost, or token usage.

Questions & Discussion

Linked Traces: Monitor Prompt Performance in Production

About

When to use

Linked Traces vs Raw Traces

How to

Metrics and Analytics

Next Steps

Versions and Labels

Prompt SDK

Log Prompt Templates