Agent as a Judge
Uses AI agents to evaluate content through a structured evaluation process. This evaluation type leverages agent-based approaches with customisable prompts and system instructions to perform comprehensive content assessment.
Evaluation Using Interface
Input:
- Configuration Parameters:
- model: The model to use for the evaluation.
- Eval Prompt: The prompt to use for the evaluation.
- System Prompt: The system prompt to use for the evaluation.
Output:
- Result: The result of the evaluation.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input:
- Configuration Parameters:
- model:
string
- The model to use for the evaluation. - evalPrompt:
string
- The prompt to use for the evaluation. - systemPrompt:
string
- The system prompt to use for the evaluation.
- model:
Output:
- Result:
string
- The result of the evaluation.
What to do when Agent Judge Evaluation Fails
In such case, reviewing the agent configuration is crucial. This includes checking the system prompt to ensure the agent’s role is correctly defined, verifying that the evaluation prompt is clear and comprehensive, and ensuring that the agent has proper access to necessary tools.
Additionally, assessing model selection is important—confirm that the chosen model is compatible with the agent’s operations, and consider using an alternative model from the available options if needed.