This evaluation template assesses whether an AI response is genuinely helpful in addressing the user’s query or request. It evaluates the utility, relevance, and effectiveness of the response in solving the user’s problem or answering their question.

Python SDK Usage

result = evaluator.evaluate(
    eval_templates="is_helpful",
    inputs={
        "input": "Why doesn’t honey go bad?",
        "output": "Honey doesn’t spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Troubleshooting

If you encounter issues with this evaluation:
  • Ensure that both the input (user query) and output (AI response) parameters are provided
  • The helpfulness evaluation works best when the context of the request is clear
  • If evaluating complex responses, make sure the entire response is included
  • Consider combining with other evaluations like completeness or factual-accuracy for more comprehensive assessment
  • completeness: Determines if the response addresses all aspects of the query
  • task-completion: Checks if a specific requested task was accomplished
  • instruction-adherence: Evaluates if the response follows specific instructions
  • is-concise: Assesses whether the response avoids unnecessary verbosity