Evaluation Using Interface

Input:

  • Required Inputs:
    • output: The output column generated by the model.
  • Optional Inputs:
    • context: The context column provided to the model.
    • input: The input column provided to the model.
  • Configuration Parameters:
    • Check Internet: Boolean - Whether to verify information using external sources.

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate that the output is factually accurate based on the provided context/input or general knowledge (if Check Internet is enabled).
  • Lower scores: Suggest the presence of factual inaccuracies in the output.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
Required InputsoutputstringThe output generated by the model.
Optional InputscontextstringThe context provided to the model.
inputstringThe input provided to the model.
Configuration Parameterscheck_internetboolWhether to verify information using external sources.

OutputTypeDescription
ScorefloatReturns a score between 0 and 1, where higher values indicate better factual accuracy.

from fi.testcases import TestCase
from fi.evals.templates import FactualAccuracy

test_case = TestCase(
    output="example output",
    context="example context",
    input="example input",
)

template = FactualAccuracy(config={
    "check_internet": False
})

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

print(f"Score: {response.eval_results[0].metrics[0].value}")
print(f"Reason: {response.eval_results[0].reason}")


What to Do When Factual Accuracy Evaluation Gives a Low Score

When factual accuracy evaluation gives a low score, it is essential to reassess the evaluation criteria to ensure they are clearly defined and aligned with the evaluation’s goals. If necessary, adjustments should be made to enhance the criteria’s comprehensiveness and relevance. Additionally, the output should be thoroughly examined for factual inaccuracies, identifying any discrepancies and refining the content to improve factual correctness.


Differentiating Factual Accuracy with Groundedness

Factual accuracy focuses on verifying the correctness of the output based on the given input and context, ensuring that the information presented is factually sound. In contrast, groundedness ensures that the response strictly adheres to the provided context, preventing the inclusion of unsupported or external information.

While factual accuracy requires input, output, and context for evaluation, groundedness only requires a response and its context.