Evaluation Using Interface

Input:

  • Required Inputs:
    • input: The original text or query column.
    • output: The AI-generated content column.

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate that the output comprehensively addresses the requirements or topics presented in the input.
  • Lower scores: Suggest that the output is missing key details or fails to cover significant aspects mentioned in the input.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
Required InputsinputstringThe original text or query.
outputstringThe AI-generated content.
OutputTypeDescription
ScorefloatReturns a score between 0 and 1, where higher values indicate more complete content relative to the input.
from fi.evals import EvalClient
from fi.testcases import TestCase
from fi.evals.templates import Completeness

completeness_eval = Completeness()

test_case = TestCase(
    input="Comprehensive response covering all aspects...",
    output="example of output content"
)

result = evaluator.evaluate(eval_templates=[completeness_eval], inputs=[test_case])
completeness_score = result.eval_results[0].metrics[0].value


What to do when Completeness is Low

Determine which aspects of the query have not been fully addressed and identify any gaps or incomplete sections that require additional information.

Enhancing the response involves adding missing details to ensure it is comprehensive and refining the content to cover all aspects of the query.

To improve completeness in the long term, implementing mechanisms that align responses more closely with query requirements and enhancing the response generation process to prioritise completeness can help ensure more thorough and accurate outputs.