Linked Traces

Associate prompts with production traces to monitor latency, token usage, and cost per prompt version in the Prompt Workbench.

About

Every time your application sends a prompt to a model, Future AGI records it as a trace: the inputs, outputs, latency, tokens used, and cost. On their own, those traces tell you how your application is performing. Linked traces connect each trace back to the specific prompt and version that produced it.

Once linked, the Prompt Workbench shows aggregated metrics per prompt version alongside the prompt itself. Instead of searching through individual traces, you see a consolidated view: how many times a prompt was called, its typical latency and cost, and how those metrics shift as you iterate.


When to use

  • Validating a prompt change in production: Compare latency and cost between versions on real traffic, not just test runs.
  • Diagnosing a cost spike: Metrics per prompt version show exactly which prompt or version is driving spend.
  • Comparing active versions: See real-world performance across prompt versions side by side to decide which to keep.
  • Auditing prompt usage: Trace count shows which prompts are actively being called and which are stale or abandoned.

Linked Traces vs Raw Traces

Raw tracesLinked traces
What you seeApplication-level metricsMetrics per prompt and version
AttributionAnonymous API callsTied to a specific template and version
Where to viewObserve / tracing dashboardPrompt Workbench Metrics tab
Setup requiredSDK instrumentationSDK instrumentation + template reference in request

How to

To link prompts to traces, you need to associate the prompt used in a generation with the corresponding trace. The process is described in the observability and manual tracing docs: Log prompt templates. Once your application sends traces that include the prompt template (or template ID), Future AGI links those traces to the prompt in the Prompt Workbench.


Metrics and Analytics

After linking, open your prompt in the dashboard and go to the Metrics tab.

MetricWhat it tells you
Median LatencyTypical time for the model to produce a response. Lower is better for responsiveness; use it to spot slow prompts or model changes.
Median Input TokensTypical size of the prompt sent to the model. Helps you see verbosity and compare input length across versions.
Median Output TokensTypical length of the model’s reply. Useful for cost and length control; compare after changing instructions or max tokens.
Median CostsTypical cost per generation for this prompt. Use it to compare cost across prompt versions or models.
Traces CountHow many times this prompt was used in the selected period. Shows which prompts are active and where to focus optimization.
First and Last GenerationWhen the prompt was first and last used. Confirms the time range of the data you’re viewing.

Compare the same metric across prompt versions or time ranges to see if a change improved latency, cost, or token usage.


Next Steps

Was this page helpful?

Questions & Discussion