Conversation Resolution

Evaluation Using Interface

Input:

Required Inputs:
- output: column containing conversation history between the user and the model

Output:

Score: percentage score between 0 and 100

Interpretation:

Higher scores: Indicate that the conversation is resolved.
Lower scores: Suggest that the conversation is not resolved.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

Required Inputs:
- output: string - conversation history between the user and the model provided as query and response pairs

Output:

Score: float - returns score between 0 and 1

Interpretation:

Higher scores: Indicate that the conversation is resolved.
Lower scores: Suggest that the conversation is not resolved.

result = evaluator.evaluate(
    eval_templates="conversation_resolution",
    inputs={
        "output": '''
                    User: My Wi-Fi keeps disconnecting every few minutes.
                    Assistant: You can try restarting your router and updating your network drivers.
                    User: I restarted the router and it's stable now. Thanks!
                    Assistant: Glad to hear that! Let me know if you need anything else.
                  '''
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

What to do when Conversation Resolution is Low

Add confirmation mechanisms to verify user satisfaction
Develop fallback responses for unclear or complex queries
Track common patterns in unresolved queries for improvement
Consider implementing a clarification system for ambiguous requests

Comparing Conversation Resolution with Similar Evals

Conversation Coherence: While Resolution focuses on addressing user needs, Coherence evaluates the logical flow and context maintenance. A conversation can be perfectly coherent but fail to resolve user queries, or vice versa.
Completeness: Resolution differs from Completeness as it focuses on satisfactory conclusion rather than comprehensive coverage. A response can be complete but not resolve the user’s actual need.
Context Relevance: Resolution evaluates whether queries are answered, while Context Relevance assesses if the provided context is sufficient for generating responses. A response can use relevant context but still fail to resolve the user’s query.

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Conversation Resolution

Evaluation Using Interface

Evaluation Using Python SDK

What to do when Conversation Resolution is Low

Comparing Conversation Resolution with Similar Evals

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using Python SDK

​What to do when Conversation Resolution is Low

​Comparing Conversation Resolution with Similar Evals

Evaluation Using Interface

Evaluation Using Python SDK

What to do when Conversation Resolution is Low

Comparing Conversation Resolution with Similar Evals