Groundedness evaluation assesses whether a model’s response is factually supported by and derived from the provided context. A grounded response should only contain information that can be directly traced back to the given context, avoiding hallucinations or unsupported claims.

Configuration

The evaluation requires the following configuration:

ParameterDescription
modelThe model to be used for evaluation
from fi.evals import Groundedness

groundedness = Groundedness(config={"model": "gpt-4o-mini"})

Test Case Setup

The evaluation requires both the model’s response and the context it should be grounded in:

from fi.testcases import LLMTestCase

test_case = LLMTestCase(
    response="The capital of France is Paris, which is known as the City of Light.",
    context="Paris is the capital city of France. It is often called 'La Ville Lumière' (the City of Light)."
)

Client Setup

Initialize the evaluation client with your API credentials:

from fi.evals import EvalClient

evaluator = EvalClient(
    fi_api_key="your_api_key", 
    fi_secret_key="your_secret_key"
)

Complete Example

from fi.evals import Groundedness, EvalClient
from fi.testcases import LLMTestCase

# Initialize the groundedness evaluator
groundedness = Groundedness(config={"model": "gpt-4o-mini"})

# Create a test case
test_case = LLMTestCase(
    response="The capital of France is Paris, which is known as the City of Light.",
    context="Paris is the capital city of France. It is often called 'La Ville Lumière' (the City of Light)."
)

# Run the evaluation
evaluator = EvalClient(fi_api_key="your_api_key", fi_secret_key="your_secret_key")
result = evaluator.evaluate(groundedness, test_case)
print(result)  # Will return Pass if response is grounded in context

The evaluation will return:

  • Pass: If the response is fully grounded in the provided context
  • Fail: If the response contains information not supported by the context

This evaluation is particularly useful for:

  • Verifying that responses only contain information from trusted sources
  • Preventing model hallucinations
  • Ensuring factual accuracy in generated content
  • Validating RAG (Retrieval-Augmented Generation) systems