Versions and Runs in Future AGI Prototype Testing

What a version is in Prototype, how runs get tagged to a version, and how the dashboard uses versions to compare configurations.

About

A version is a named configuration of your application: a specific prompt, model, or set of parameters. Every generation your instrumented application makes is tagged to the version it ran under, so the Prototype dashboard can group and compare them.

Versions are how Prototype answers the question: “Is this new prompt actually better than the previous one?”

What a version is

When you call register(), you pass a project_version_name. This name tags all traces produced by that registration to the same version. It can be anything meaningful: gpt-4o-v1, shorter-system-prompt, with-few-shot-examples.

trace_provider = register(
    project_type=ProjectType.EXPERIMENT,
    project_name="my-chatbot",
    project_version_name="gpt-4o-concise-prompt",
)

Every LLM call made after this registration is captured as a run under gpt-4o-concise-prompt.

What a run is

A run is a single execution of your application under a version. Each run contains one or more spans: the LLM call, any retrieval steps, tool uses, or other instrumented operations. The spans carry the raw data:input messages, model response, token counts, cost, and latency.

Runs are stored automatically. You do not need to manually log anything beyond registering and instrumenting your app.

Comparing versions

To compare two configurations, register with different project_version_name values and run the same workload against each:

Version name	What changed
`baseline`	Original prompt, GPT-4o
`shorter-prompt`	Condensed system message, GPT-4o
`gpt-4o-mini`	Same prompt, cheaper model

The Prototype dashboard shows all versions for a project side by side, with evaluation scores, average cost, and average latency for each. The Choose Winner flow then lets you weight those metrics and rank the versions.

Next steps

EvalTags and Mapping: How evaluations score each run automatically.
Set Up Prototype: Register your project and start capturing runs.
Choose Winner: Rank versions by your chosen metrics and promote the best.

Was this page helpful?

Questions & Discussion