Eval Definition
Task Completion
Evaluates whether a response successfully completes the task requested in the input.
Evaluation Using Interface
Input:
- Required Inputs:
- input: The task request or question.
- output: The response to evaluate.
Output:
- Result: Returns ‘Passed’ if the response successfully completes the requested task, ‘Failed’ if it doesn’t.
- Reason: A detailed explanation of why the response was classified as successfully completing the task or not.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input:
- Required Inputs:
- input:
string
- The task request or question. - output:
string
- The response to evaluate.
- input:
Output:
- Result: Returns a list containing ‘Passed’ if the response successfully completes the requested task, or ‘Failed’ if it doesn’t.
- Reason: Provides a detailed explanation of the evaluation.
Example Output:
What to do If you get Undesired Results
If the response is evaluated as not completing the task (Failed) and you want to improve it:
- Make sure the response directly addresses the specific task or question asked
- Ensure all parts of multi-part questions or requests are addressed
- Provide complete information without assuming prior knowledge
- For how-to requests, include clear, actionable steps
- For questions seeking explanations, provide the reasoning or mechanisms behind the answer
- Consider whether the task requires specific formatting, calculations, or output types
- Verify that the response is accurate and relevant to the specific task
Comparing Task Completion with Similar Evals
- Completeness: While Task Completion evaluates whether a response successfully accomplishes a requested task, Completeness focuses specifically on whether all required information is included.
- Instruction Adherence: Task Completion evaluates whether a response accomplishes the requested task, whereas Instruction Adherence measures how well the response follows specific instructions.
- Is Helpful: Task Completion focuses on successful completion of a task, while Is Helpful evaluates the overall usefulness of a response.