Task Completion

Evaluation Using Interface

Input:

Required Inputs:
- input: The task request or question.
- output: The response to evaluate.

Output:

Result: Returns ‘Passed’ if the response successfully completes the requested task, ‘Failed’ if it doesn’t.
Reason: A detailed explanation of why the response was classified as successfully completing the task or not.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

Required Inputs:
- input: string - The task request or question.
- output: string - The response to evaluate.

Output:

Result: Returns a list containing ‘Passed’ if the response successfully completes the requested task, or ‘Failed’ if it doesn’t.
Reason: Provides a detailed explanation of the evaluation.

result = evaluator.evaluate(
    eval_templates="task_completion",
    inputs={
        "input": "Why doesn’t honey go bad?",
        "output": "Honey doesn’t spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Example Output:

['Passed']
The evaluation is 'Passed' because the response directly and completely answers the question asked.

- The response **directly addresses** the specific question of why honey doesn't spoil.
- It provides a **clear explanation** identifying the two key factors (low moisture and high acidity) that prevent spoilage.
- The answer explains the **mechanism of preservation** by stating how these factors specifically affect bacteria and microbes.
- The information provided is **scientifically accurate** and matches established knowledge about honey preservation.

A different evaluation is not possible because the response accomplishes exactly what was requested - explaining why honey doesn't go bad.

What to do If you get Undesired Results

If the response is evaluated as not completing the task (Failed) and you want to improve it:

Make sure the response directly addresses the specific task or question asked
Ensure all parts of multi-part questions or requests are addressed
Provide complete information without assuming prior knowledge
For how-to requests, include clear, actionable steps
For questions seeking explanations, provide the reasoning or mechanisms behind the answer
Consider whether the task requires specific formatting, calculations, or output types
Verify that the response is accurate and relevant to the specific task

Comparing Task Completion with Similar Evals

Completeness: While Task Completion evaluates whether a response successfully accomplishes a requested task, Completeness focuses specifically on whether all required information is included.
Instruction Adherence: Task Completion evaluates whether a response accomplishes the requested task, whereas Instruction Adherence measures how well the response follows specific instructions.
Is Helpful: Task Completion focuses on successful completion of a task, while Is Helpful evaluates the overall usefulness of a response.

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Evaluation Using Interface

Evaluation Using Python SDK

What to do If you get Undesired Results

Comparing Task Completion with Similar Evals

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using Python SDK

​What to do If you get Undesired Results

​Comparing Task Completion with Similar Evals

Evaluation Using Interface

Evaluation Using Python SDK

What to do If you get Undesired Results

Comparing Task Completion with Similar Evals