Custom validation refers to the ability to evaluate and validate text output from LLM models based on custom criteria. This includes running custom validation code, making external API calls for validation, and verifying text is in the correct language.


Evaluation TypeDescriptionRequired Keys
LLMJudgeUses language models to evaluate contentNone
GradingCriteriaEvaluates responses against custom grading criteriaresponse
AgentJudgeUses AI agents for content evaluationNone

Evaluation TypeDescriptionRequired Configuration
LLMJudgeUses language models to evaluate contentmodel: LLM model to use
eval_prompt: Evaluation prompt
system_prompt: System prompt
GradingCriteriaEvaluates against grading criteriagrading_criteria: Grading criteria
model: LLM model to use
AgentJudgeUses AI agents for evaluationmodel: LLM model to use
eval_prompt: Agent prompt
system_prompt: System prompt

Examples

Custom Prompt Example

from fi.evals import LLMJudge, EvalClient
from fi.testcases import TestCase

evaluator = EvalClient(fi_api_key="your_api_key", fi_secret_key="your_secret_key")

# Create a custom evaluation using GPT-4 to check if response is professional
custom_test_case = TestCase(text="Hey dude, wassup with that report?")
custom_template = LLMJudge(config={
    "model": "gpt-4o-mini",
    "eval_prompt": "Evaluate if the following text is professional for a business context.",
    "system_prompt": "You are a professional communication expert."
})

result = evaluator.evaluate(custom_template, custom_test_case)
print(result)  # Will evaluate the professionalism of the text

Grading Criteria Example

from fi.evals import GradingCriteria, EvalClient
from fi.testcases import LLMTestCase

evaluator = EvalClient(fi_api_key="your_api_key", fi_secret_key="your_secret_key")

# Create a test case with response to evaluate
test_case = LLMTestCase(
    response="The mitochondria is the powerhouse of the cell. It produces energy through cellular respiration."
)

# Define grading criteria for biology answer
grading_template = GradingCriteria(config={
    "grading_criteria": """ Grade this response on the following criteria:
    1. Correctly identifies mitochondria's main function
    2. Mentions cellular respiration
    3. Uses proper scientific terminology
    4. Complete and accurate explanation
    
    Response must meet at least 3 criteria to pass.
    """,
    "model": "gpt-4o-mini" # Optional, defaults to system default
})

# Run the evaluation
result = evaluator.evaluate(grading_template, test_case)
print(result) # Will return Pass if response meets grading criteria