Evaluates how logically a conversation flows and maintains context throughout the dialogue. This metric assesses whether responses are consistent, contextually appropriate, and maintain a natural progression of ideas within the conversation thread.
Click here to learn how to setup evaluation using the Python SDK.Input:
list[string]
- conversation history between the user and the model provided as query and response pairsfloat
- returns score between 0 and 1