Click here to learn how to setup evaluation using the Python SDK.
Input:
Required Inputs:
input: string - The task request or question.
output: string - The response to evaluate.
Output:
Result: Returns a list containing ‘Passed’ if the response successfully completes the requested task, or ‘Failed’ if it doesn’t.
Reason: Provides a detailed explanation of the evaluation.
Copy
result = evaluator.evaluate( eval_templates="task_completion", inputs={ "input": "Why doesn’t honey go bad?", "output": "Honey doesn’t spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes." }, model_name="turing_flash")print(result.eval_results[0].output)print(result.eval_results[0].reason)
Example Output:
Copy
['Passed']The evaluation is 'Passed' because the response directly and completely answers the question asked.- The response **directly addresses** the specific question of why honey doesn't spoil.- It provides a **clear explanation** identifying the two key factors (low moisture and high acidity) that prevent spoilage.- The answer explains the **mechanism of preservation** by stating how these factors specifically affect bacteria and microbes.- The information provided is **scientifically accurate** and matches established knowledge about honey preservation.A different evaluation is not possible because the response accomplishes exactly what was requested - explaining why honey doesn't go bad.
Completeness: While Task Completion evaluates whether a response successfully accomplishes a requested task, Completeness focuses specifically on whether all required information is included.
Instruction Adherence: Task Completion evaluates whether a response accomplishes the requested task, whereas Instruction Adherence measures how well the response follows specific instructions.
Is Helpful: Task Completion focuses on successful completion of a task, while Is Helpful evaluates the overall usefulness of a response.