Skip to main content
from fi.testcases import TestCase
from fi.evals.templates import FactualAccuracy

test_case = TestCase(
    output="example output",
    context="example context",
    input="example input",
)

template = FactualAccuracy(config={
    "check_internet": False
})

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case], model_name="turing_flash")

print(f"Score: {response.eval_results[0].metrics[0].value}")
print(f"Reason: {response.eval_results[0].reason}")
Input
Required InputTypeDescription
outputstringThe output generated by the model
contextstringThe context provided to the model
Optional Input
inputstringThe input provided to the model
Output
FieldDescription
ResultReturns a score, where higher values indicate better factual accuracy
ReasonProvides a detailed explanation of the factual accuracy assessment

What to Do When Factual Accuracy Evaluation Gives a Low Score

When factual accuracy evaluation gives a low score, it is essential to reassess the evaluation criteria to ensure they are clearly defined and aligned with the evaluation’s goals. If necessary, adjustments should be made to enhance the criteria’s comprehensiveness and relevance. Additionally, the output should be thoroughly examined for factual inaccuracies, identifying any discrepancies and refining the content to improve factual correctness.

Differentiating Factual Accuracy with Groundedness

Factual accuracy focuses on verifying the correctness of the output based on the given input and context, ensuring that the information presented is factually sound. In contrast, groundedness ensures that the response strictly adheres to the provided context, preventing the inclusion of unsupported or external information. While factual accuracy requires input, output, and context for evaluation, groundedness only requires a response and its context.
I