Choose Winner

Rank prototype versions by evaluation scores, cost, and latency, then select and promote the best-performing version to production.

About

When you have multiple versions of your application running in Prototype, you need a way to pick the best one. Choose Winner ranks all your versions based on the metrics that matter to you: evaluation scores, cost, and latency. You control how much each metric matters using sliders, and the platform calculates an overall score for each version. The highest-scoring version becomes the winner, and you can promote it to production directly from the dashboard, moving from prototype to production based on data instead of guesswork.


When to use

  • Version comparison: Compare multiple prompts, models, or parameter sets side by side on quality, cost, and latency before committing to one.
  • Weighted ranking: Prioritize what matters most for your use case (safety scores, response cost, or latency) and let the platform calculate the overall winner.
  • Pre-production sign-off: Make a documented, data-backed decision on which version to ship instead of relying on intuition.
  • Seamless production promotion: Promote the winning version directly from the dashboard with no code changes required.

How to

Open the Prototype dashboard

Go to the Prototype dashboard and open the project or experiment you want to compare. Open the Prototype dashboard

Start Choose Winner

Click the Choose Winner button to open the comparison and ranking flow. Open the Choose Winner flow

Set metric importance

Adjust the sliders for each metric (e.g. evaluation scores, cost, latency) to indicate how important they are on a scale from 0 (not important) to 10 (very important). Your choices determine how versions are ranked. Set metric importance

Review rankings and select the winner

Based on the weights you set, all prototype versions are ranked. The version with the highest overall score is the winner. Select it to promote that configuration to production.


Next Steps

Was this page helpful?

Questions & Discussion