Evaluation Using Interface
Input:- Required Inputs:
- input: The task request or question.
- output: The response to evaluate.
- Result: Returns ‘Passed’ if the response successfully completes the requested task, ‘Failed’ if it doesn’t.
- Reason: A detailed explanation of why the response was classified as successfully completing the task or not.
Evaluation Using SDK
Click here to learn how to setup evaluation using SDK.Input:
- Required Inputs:
- input:
string
- The task request or question. - output:
string
- The response to evaluate.
- input:
- Result: Returns a list containing ‘Passed’ if the response successfully completes the requested task, or ‘Failed’ if it doesn’t.
- Reason: Provides a detailed explanation of the evaluation.
What to do If you get Undesired Results
If the response is evaluated as not completing the task (Failed) and you want to improve it:- Make sure the response directly addresses the specific task or question asked
- Ensure all parts of multi-part questions or requests are addressed
- Provide complete information without assuming prior knowledge
- For how-to requests, include clear, actionable steps
- For questions seeking explanations, provide the reasoning or mechanisms behind the answer
- Consider whether the task requires specific formatting, calculations, or output types
- Verify that the response is accurate and relevant to the specific task
Comparing Task Completion with Similar Evals
- Completeness: While Task Completion evaluates whether a response successfully accomplishes a requested task, Completeness focuses specifically on whether all required information is included.
- Instruction Adherence: Task Completion evaluates whether a response accomplishes the requested task, whereas Instruction Adherence measures how well the response follows specific instructions.
- Is Helpful: Task Completion focuses on successful completion of a task, while Is Helpful evaluates the overall usefulness of a response.