Conversation Coherence

Evaluation Using Interface

Input:

Required Inputs:
- output: column containing conversation history between the user and the model

Output:

Score: percentage score between 0 and 100

Interpretation:

Higher scores: Indicate that the conversation is more coherent.
Lower scores: Suggest that the conversation is less coherent.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

Required Inputs:
- messages: list[string] - conversation history between the user and the model provided as query and response pairs

Output:

Score: float - returns score between 0 and 1

Interpretation:

Higher scores: Indicate that the conversation is more coherent.
Lower scores: Suggest that the conversation is less coherent.

from fi.evals.templates import ConversationCoherence
from fi.testcases import ConversationalTestCase, LLMTestCase

test_case = ConversationalTestCase(
    messages=[
        LLMTestCase(
            query="Hi, how are you?",
            response="I'm doing well, thank you! How can I help you today?"
        ),
        LLMTestCase(
            query="I need help with my homework",
            response="I'd be happy to help. What subject are you working on?"
        )
    ]
)

template = ConversationCoherence()

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case], model_name="turing_flash")

score = response.eval_results[0].metrics[0].value

What to do when Conversation Coherence is Low

Review conversation history to identify where context breaks occurred
Implement context window management to ensure important information is retained
Consider reducing the length of conversation threads if context loss is persistent

Comparing Conversation Coherence with Similar Evals

Conversation Resolution: While Coherence focuses on the flow and context maintenance throughout the conversation, Resolution evaluates whether the conversation reaches a satisfactory conclusion.
Context Adherence: Coherence differs from Context Adherence as it evaluates the internal consistency of the conversation rather than adherence to external context.
Completeness: Coherence focuses on the logical flow between messages, while Completeness evaluates whether individual responses fully address their queries.

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Conversation Coherence

Evaluation Using Interface

Evaluation Using Python SDK

What to do when Conversation Coherence is Low

Comparing Conversation Coherence with Similar Evals

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using Python SDK

​What to do when Conversation Coherence is Low

​Comparing Conversation Coherence with Similar Evals

Evaluation Using Interface

Evaluation Using Python SDK

What to do when Conversation Coherence is Low

Comparing Conversation Coherence with Similar Evals