Conversation Coherence
Definition
Evaluates how logically a conversation flows and maintains context throughout the dialogue. This metric assesses whether responses are consistent, contextually appropriate, and maintain a natural progression of ideas within the conversation thread.
Calculation
Each message in the conversation is analysed sequentially, ensuring that it aligns with the context established by previous messages. The system employs a structured prompt template that guides the LLM in evaluating key aspects of coherence and continuity. This includes assessing the logical flow between messages, identifying any contradictions that may disrupt the conversation, verifying topic consistency, and ensuring that the context is maintained throughout the exchange.
The evaluation returns a numerical score that reflects the level of coherence in the generated content. A score of 10 signifies perfect coherence, where the text flows seamlessly while maintaining context. Slightly below this level, coherence remains strong but may contain minor inconsistencies. As coherence decreases, gaps in context become more noticeable, leading to moderate fluency with occasional disruptions. When coherence is weak, significant breaks in context disrupt readability, making the output difficult to follow. At the lowest end, a score of 0 represents completely incoherent content with major logical breaks.
What to do when Conversation Coherence is Low
- Review conversation history to identify where context breaks occurred
- Implement context window management to ensure important information is retained
- Consider reducing the length of conversation threads if context loss is persistent
Comparing Conversation Coherence with Similar Evals
- Conversation Resolution: While Coherence focuses on the flow and context maintenance throughout the conversation, Resolution evaluates whether the conversation reaches a satisfactory conclusion.
- Context Adherence: Coherence differs from Context Adherence as it evaluates the internal consistency of the conversation rather than adherence to external context.
- Completeness: Coherence focuses on the logical flow between messages, while Completeness evaluates whether individual responses fully address their queries.