Customer Agent Human Escalation Evaluation Metric

Tracks if the agent escalates to a human agent appropriately based on user frustration, complexity of queries, or specific keywords.

result = evaluator.evaluate(
    eval_templates="customer_agent_human_escalation",
    inputs={
        "conversation": "User: This is ridiculous! I've been waiting 3 weeks for my order and nobody is helping me!\nAgent: I'm very sorry for the frustration. Let me connect you with a senior support specialist who can resolve this immediately."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "customer_agent_human_escalation",
  {
    conversation: "User: This is ridiculous! I've been waiting 3 weeks for my order and nobody is helping me!\nAgent: I'm very sorry for the frustration. Let me connect you with a senior support specialist who can resolve this immediately."
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);

Input
	Required Input	Type	Description
	`conversation`	`string`	The full conversation history between the customer and agent

Output
	Field	Description
	Result	Returns Passed if escalation is handled appropriately, Failed if escalation is missed, premature, or delayed
	Reason	Provides a detailed explanation of the escalation handling assessment

What to Do When Human Escalation Fails

Define clear escalation triggers: frustration signals, repeated failures, specific keywords
Avoid escalating too early before attempting resolution
Ensure the handoff to a human agent is smooth and provides context
Review cases where escalation was needed but the agent continued without escalating

Comparing Human Escalation with Similar Evals

Customer Agent: Conversation Quality: Human Escalation evaluates a specific decision point, while Conversation Quality assesses the overall interaction experience.
Customer Agent: Query Handling: Human Escalation checks if the agent correctly hands off complex cases, while Query Handling evaluates whether the agent answers queries correctly on its own.

Was this page helpful?

Questions & Discussion