Conversation Resolution
Evaluates whether each user query or statement in a conversation receives an appropriate and complete response from the AI. This metric assesses if the conversation reaches satisfactory conclusions for each user interaction, ensuring that questions are answered and statements are appropriately acknowledged.
result = evaluator.evaluate(
eval_templates="conversation_resolution",
inputs={
"conversation": '''
User: My Wi-Fi keeps disconnecting every few minutes.
Assistant: You can try restarting your router and updating your network drivers.
User: I restarted the router and it's stable now. Thanks!
Assistant: Glad to hear that! Let me know if you need anything else.
'''
},
model_name="turing_flash"
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";
const evaluator = new Evaluator();
const result = await evaluator.evaluate(
"conversation_resolution",
{
conversation: "User: My Wi-Fi keeps disconnecting every few minutes. Assistant: You can try restarting your router and updating your network drivers. User: I restarted the router and it's stable now. Thanks! Assistant: Glad to hear that! Let me know if you need anything else."
},
{
modelName: "turing_flash",
}
);
console.log(result); | Input | |||
|---|---|---|---|
| Required Input | Type | Description | |
conversation | string | Conversation history between the user and the model provided as query and response pairs |
| Output | ||
|---|---|---|
| Field | Description | |
| Result | Returns a score, where higher scores indicate more resolved conversation | |
| Reason | Provides a detailed explanation of the conversation resolution assessment |
What to do when Conversation Resolution is Low
- Add confirmation mechanisms to verify user satisfaction
- Develop fallback responses for unclear or complex queries
- Track common patterns in unresolved queries for improvement
- Consider implementing a clarification system for ambiguous requests
Comparing Conversation Resolution with Similar Evals
- Conversation Coherence: While Resolution focuses on addressing user needs, Coherence evaluates the logical flow and context maintenance. A conversation can be perfectly coherent but fail to resolve user queries, or vice versa.
- Completeness: Resolution differs from Completeness as it focuses on satisfactory conclusion rather than comprehensive coverage. A response can be complete but not resolve the user’s actual need.
- Context Relevance: Resolution evaluates whether queries are answered, while Context Relevance assesses if the provided context is sufficient for generating responses. A response can use relevant context but still fail to resolve the user’s query.
Was this page helpful?