This evaluation template assesses whether an AI response is genuinely helpful in addressing the user’s query or request. It evaluates the utility, relevance, and effectiveness of the response in solving the user’s problem or answering their question.

Interface Usage

result = evaluator.evaluate(
    eval_templates="is_helpful", 
    inputs={
        "input": "Why doesn't honey go bad?",
        "output": "Honey doesn't spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].metrics[0].value)
print(result.eval_results[0].reason)

Python SDK Usage

from futureagi import Evaluator

# Initialize the evaluator
evaluator = Evaluator(api_key="your_api_key")

# Evaluate the helpfulness of a response
result = evaluator.evaluate(
    eval_templates="is_helpful", 
    inputs={
        "input": "Why doesn't honey go bad?",
        "output": "Honey doesn't spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

# Access the result
is_helpful = result.eval_results[0].metrics[0].value
reason = result.eval_results[0].reason

print(f"Is helpful: {is_helpful}")
print(f"Reason: {reason}")

Example Output

True
The response directly answers the user's question about why honey doesn't go bad by explaining the scientific reason - its low moisture content and high acidity prevent microbial growth. This explanation is clear, concise, and addresses the specific question asked, providing valuable and accurate information that satisfies the user's query.

Troubleshooting

If you encounter issues with this evaluation:

  • Ensure that both the input (user query) and output (AI response) parameters are provided
  • The helpfulness evaluation works best when the context of the request is clear
  • If evaluating complex responses, make sure the entire response is included
  • Consider combining with other evaluations like completeness or factual-accuracy for more comprehensive assessment
  • completeness: Determines if the response addresses all aspects of the query
  • task-completion: Checks if a specific requested task was accomplished
  • instruction-adherence: Evaluates if the response follows specific instructions
  • is-concise: Assesses whether the response avoids unnecessary verbosity