Skip to main content
result = evaluator.evaluate(
    eval_templates="task_completion",
    inputs={
        "input": "Why doesn't honey go bad?",
        "output": "Honey doesn't spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
Input
Required InputTypeDescription
inputstringUser request or question to the model.
outputstringResponse of the model based on the input.
Output
FieldDescription
ResultReturns Passed if the response successfully completes the requested task, or Failed if it doesn’t.
ReasonProvides a detailed explanation of why the response was classified as successfully completing the task or not.

What to do If you get Undesired Results

If the response is evaluated as not completing the task (Failed) and you want to improve it:
  • Make sure the response directly addresses the specific task or question asked
  • Ensure all parts of multi-part questions or requests are addressed
  • Provide complete information without assuming prior knowledge
  • For how-to requests, include clear, actionable steps
  • For questions seeking explanations, provide the reasoning or mechanisms behind the answer
  • Consider whether the task requires specific formatting, calculations, or output types
  • Verify that the response is accurate and relevant to the specific task

Comparing Task Completion with Similar Evals

  • Completeness: While Task Completion evaluates whether a response successfully accomplishes a requested task, Completeness focuses specifically on whether all required information is included.
  • Instruction Adherence: Task Completion evaluates whether a response accomplishes the requested task, whereas Instruction Adherence measures how well the response follows specific instructions.
  • Is Helpful: Task Completion focuses on successful completion of a task, while Is Helpful evaluates the overall usefulness of a response.
I