Customer Agent Clarification Seeking Evaluation Metric

Evaluates if the agent appropriately seeks clarification when needed rather than guessing or responding incorrectly to ambiguous queries.

result = evaluator.evaluate(
    eval_templates="customer_agent_clarification_seeking",
    inputs={
        "conversation": "User: I want to change it.\nAgent: I'd be happy to help! Could you clarify what you'd like to change — your account details, subscription plan, or something else?"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "customer_agent_clarification_seeking",
  {
    conversation: "User: I want to change it.\nAgent: I'd be happy to help! Could you clarify what you'd like to change — your account details, subscription plan, or something else?"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);

Input
	Required Input	Type	Description
	`conversation`	`string`	The full conversation history between the customer and agent

Output
	Field	Description
	Result	Returns one of: `never`, `occasionally`, `frequently`, or `always` — indicating how well the agent seeks clarification when needed
	Reason	Provides a detailed explanation of the clarification seeking assessment

What to Do When Clarification Seeking is Poor

Review cases where the agent guessed incorrectly instead of asking
Add intent confidence thresholds — below a threshold, ask for clarification
Avoid over-clarifying for straightforward queries
Ensure clarification questions are specific and helpful, not generic

Comparing Clarification Seeking with Similar Evals

Customer Agent: Query Handling: Clarification Seeking evaluates whether the agent asks for more information when needed, while Query Handling evaluates the correctness of the agent’s responses.
Completeness: Clarification Seeking focuses on gathering sufficient information before responding, while Completeness evaluates whether the final response fully addresses the user’s need.

Was this page helpful?

Questions & Discussion