Conversation Resolution
Definition
Evaluates whether each user query or statement in a conversation receives an appropriate and complete response from the AI. This metric assesses if the conversation reaches satisfactory conclusions for each user interaction, ensuring that questions are answered and statements are appropriately acknowledged.
Calculation
The evaluation process systematically analyses each user-AI interaction, leveraging a specialised LLM evaluator to determine the resolution status of conversations. Each user message is categorised as either a question or a statement, providing a structured basis for assessment.
The AI’s response is then evaluated against three possible outcomes: “Resolved” for messages that are fully addressed, “Partial” for responses that only partially satisfy the user’s query, and “Unresolved” for cases where the AI fails to provide a meaningful resolution. Based on this classification, the system assigns a numerical score reflecting the overall resolution effectiveness.
A score of 10 indicates that all messages are fully resolved, while slightly lower scores reflect minor gaps or inconsistencies. As resolution quality declines, scores decrease accordingly, with significant unresolved elements lowering the rating further. At the lowest end, a score of 0 represents a complete failure to resolve user queries.
What to do when Conversation Resolution is Low
- Add confirmation mechanisms to verify user satisfaction
- Develop fallback responses for unclear or complex queries
- Track common patterns in unresolved queries for improvement
- Consider implementing a clarification system for ambiguous requests
Comparing Conversation Resolution with Similar Evals
- Conversation Coherence: While Resolution focuses on addressing user needs, Coherence evaluates the logical flow and context maintenance. A conversation can be perfectly coherent but fail to resolve user queries, or vice versa.
- Completeness: Resolution differs from Completeness as it focuses on satisfactory conclusion rather than comprehensive coverage. A response can be complete but not resolve the user’s actual need.
- Context Relevance: Resolution evaluates whether queries are answered, while Context Relevance assesses if the provided context is sufficient for generating responses. A response can use relevant context but still fail to resolve the user’s query.