Eval Definition
Conversation Coherence
Evaluates how logically a conversation flows and maintains context throughout the dialogue. This metric assesses whether responses are consistent, contextually appropriate, and maintain a natural progression of ideas within the conversation thread.
Evaluation Using Interface
Input:
- Required Inputs:
- output: column containing conversation history between the user and the model
Output:
- Score: percentage score between 0 and 100
Interpretation:
- Higher scores: Indicate that the conversation is more coherent.
- Lower scores: Suggest that the conversation is less coherent.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input:
- Required Inputs:
- messages:
list[string]
- conversation history between the user and the model provided as query and response pairs
- messages:
Output:
- Score:
float
- returns score between 0 and 1
Interpretation:
- Higher scores: Indicate that the conversation is more coherent.
- Lower scores: Suggest that the conversation is less coherent.
What to do when Conversation Coherence is Low
- Review conversation history to identify where context breaks occurred
- Implement context window management to ensure important information is retained
- Consider reducing the length of conversation threads if context loss is persistent
Comparing Conversation Coherence with Similar Evals
- Conversation Resolution: While Coherence focuses on the flow and context maintenance throughout the conversation, Resolution evaluates whether the conversation reaches a satisfactory conclusion.
- Context Adherence: Coherence differs from Context Adherence as it evaluates the internal consistency of the conversation rather than adherence to external context.
- Completeness: Coherence focuses on the logical flow between messages, while Completeness evaluates whether individual responses fully address their queries.
Was this page helpful?
Previous
Conversation ResolutionEvaluates whether each user query or statement in a conversation receives an appropriate and complete response from the AI. This metric assesses if the conversation reaches satisfactory conclusions for each user interaction, ensuring that questions are answered and statements are appropriately acknowledged.
Next