Grading criteria evaluation allows you to assess LLM model responses based on custom grading criteria that you define. This evaluation type is useful for checking if responses meet specific quality standards, educational requirements, or other custom criteria.

Required Configuration

ParameterDescriptionRequired
grading_criteriaThe criteria used to grade/evaluate the responseYes
modelThe LLM model to use for evaluationYes

Example

from fi.evals import GradingCriteria, EvalClient
from fi.testcases import LLMTestCase

evaluator = EvalClient(fi_api_key="your_api_key", fi_secret_key="your_secret_key")

# Create a test case with response to evaluate
test_case = LLMTestCase(
    response="The mitochondria is the powerhouse of the cell. It produces energy through cellular respiration."
)

# Define grading criteria for biology answer
grading_template = GradingCriteria(config={
    "grading_criteria": """ Grade this response on the following criteria:
    1. Correctly identifies mitochondria's main function
    2. Mentions cellular respiration
    3. Uses proper scientific terminology
    4. Complete and accurate explanation
    
    Response must meet at least 3 criteria to pass.
    """,
    "model": "gpt-4o-mini" # Optional, defaults to system default
})

# Run the evaluation
result = evaluator.evaluate(grading_template, test_case)
print(result) # Will return Pass if response meets grading criteria