Skip to main content
result = evaluator.evaluate(
    eval_templates="conversation_resolution",
    inputs={
        "conversation": '''
                    User: My Wi-Fi keeps disconnecting every few minutes.
                    Assistant: You can try restarting your router and updating your network drivers.
                    User: I restarted the router and it's stable now. Thanks!
                    Assistant: Glad to hear that! Let me know if you need anything else.
                  '''
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Input
Required InputTypeDescription
conversationstringConversation history between the user and the model provided as query and response pairs
Output
FieldDescription
ResultReturns a score, where higher scores indicate more resolved conversation
ReasonProvides a detailed explanation of the conversation resolution assessment

What to do when Conversation Resolution is Low

  • Add confirmation mechanisms to verify user satisfaction
  • Develop fallback responses for unclear or complex queries
  • Track common patterns in unresolved queries for improvement
  • Consider implementing a clarification system for ambiguous requests

Comparing Conversation Resolution with Similar Evals

  1. Conversation Coherence: While Resolution focuses on addressing user needs, Coherence evaluates the logical flow and context maintenance. A conversation can be perfectly coherent but fail to resolve user queries, or vice versa.
  2. Completeness: Resolution differs from Completeness as it focuses on satisfactory conclusion rather than comprehensive coverage. A response can be complete but not resolve the user’s actual need.
  3. Context Relevance: Resolution evaluates whether queries are answered, while Context Relevance assesses if the provided context is sufficient for generating responses. A response can use relevant context but still fail to resolve the user’s query.
I