Evaluation Using Interface
Input:- Required Inputs:
- output: column containing conversation history between the user and the model
- Score: percentage score between 0 and 100
- Higher scores: Indicate that the conversation is resolved.
- Lower scores: Suggest that the conversation is not resolved.
Evaluation Using SDK
Click here to learn how to setup evaluation using SDK.Input:
- Required Inputs:
- output:
string
- conversation history between the user and the model provided as query and response pairs
- output:
- Score:
float
- returns score between 0 and 1
- Higher scores: Indicate that the conversation is resolved.
- Lower scores: Suggest that the conversation is not resolved.
What to do when Conversation Resolution is Low
- Add confirmation mechanisms to verify user satisfaction
- Develop fallback responses for unclear or complex queries
- Track common patterns in unresolved queries for improvement
- Consider implementing a clarification system for ambiguous requests
Comparing Conversation Resolution with Similar Evals
- Conversation Coherence: While Resolution focuses on addressing user needs, Coherence evaluates the logical flow and context maintenance. A conversation can be perfectly coherent but fail to resolve user queries, or vice versa.
- Completeness: Resolution differs from Completeness as it focuses on satisfactory conclusion rather than comprehensive coverage. A response can be complete but not resolve the user’s actual need.
- Context Relevance: Resolution evaluates whether queries are answered, while Context Relevance assesses if the provided context is sufficient for generating responses. A response can use relevant context but still fail to resolve the user’s query.