Judge Models
What a judge model is, how it scores responses, and how to choose the right one for your evaluation.
About
A judge model is the model that reads each response and applies the eval template criteria to produce a result. When you run an evaluation, the judge receives the text to evaluate, the template’s rule prompt, and the required inputs, then returns a result and a reason.
The judge model determines how accurately and how quickly each response gets scored. Choosing the right one lets you balance precision and performance for your specific workload.
How a judge scores a response
- The platform constructs a prompt from the eval template criteria and the row’s input values.
- The judge model receives this prompt and reads the response being evaluated.
- The judge returns a result (pass/fail, score, or category) and a reason explaining the judgment.
- The platform stores the result and reason for that row.
The judge model does not generate or modify your AI’s responses. It only reads and scores them.
Available judge models
Future AGI provides a set of proprietary models built specifically for evaluation:
| Model | Code | Best for | Latency |
|---|---|---|---|
| TURING_LARGE | turing_large | Max accuracy, multimodal evals (text, image, audio) | Higher |
| TURING_SMALL | turing_small | High fidelity at lower cost (text, image) | Medium |
| TURING_FLASH | turing_flash | Fast, high-accuracy evals (text, image) | Low |
| PROTECT | protect | Safety, guardrails, user-defined rules (text, audio) | Low |
| PROTECT_FLASH | protect_flash | First-pass binary filtering (text only) | Ultra-low |
See Future AGI models for full details on each model.
You can also bring your own model using the custom models integration. This is useful when you need a domain-specific fine-tuned model, want to keep inference in a specific cloud region, or already pay for a model you want to use as the judge.
How to choose a judge
| Situation | Recommended model |
|---|---|
| Maximum accuracy matters more than speed | turing_large |
| High quality at reasonable cost | turing_small |
| Large-scale runs where speed is important | turing_flash |
| Safety and guardrail checks | protect or protect_flash |
| Evaluating images or audio | turing_large or turing_small |
| Domain-specific or compliance requirements | Custom model |
Next steps
- Future AGI models: Full reference for built-in judge models.
- Use custom models: Bring your own model as the judge.
- Eval templates: The criteria the judge applies.
- Eval results: What the judge produces after scoring.