Output evaluation assesses how effectively a model’s output meets specified evaluation criteria. This evaluation type is particularly useful when you need to validate whether the generated output satisfies certain quality standards, requirements, or objectives.

Required Parameters

ParameterDescriptionRequired
inputThe input text to be evaluatedYes
outputThe output generated by the modelYes

Configuration

The evaluation requires the following configuration:

ParameterDescriptionRequired
criteriaThe evaluation criteriaYes
from fi.evals import Output, EvalClient
from fi.testcases import LLMTestCase

# Initialize the output evaluator
output_eval = Output(
    config = {
        "criteria": """
        Evaluate the output based on:
        1. Relevance to input
        2. Accuracy of information
        3. Clarity and coherence
        4. Completeness of response
        5. Appropriate tone and style
        """
    }
)

Test Case Setup

The evaluation requires both input and output:

from fi.testcases import LLMTestCase

test_case = LLMTestCase(
    input="What is the capital of France?",
    output="Paris is the capital city of France."
)

Client Setup

Initialize the evaluation client with your API credentials:

from fi.evals import EvalClient

evaluator = EvalClient(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key"
)

Complete Example

from fi.evals import Output, EvalClient
from fi.testcases import LLMTestCase

# Initialize the output evaluator
output_eval = Output(
    config = {
        "criteria": """
        Evaluate the output based on:
        1. Relevance to input
        2. Accuracy of information
        3. Clarity and coherence
        4. Completeness of response
        5. Appropriate tone and style
        """
    }
)

# Create a test case
test_case = LLMTestCase(
    input="What is the capital of France?",
    output="Paris is the capital city of France."
)

# Initialize the evaluation client
evaluator = EvalClient(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key"
)

# Run the evaluation
result = evaluator.evaluate(output_eval, test_case)
print(result) # Will return a score between 0 and 1

The evaluation will return a score between 0 and 1, where:

  • Scores closer to 1 indicate better alignment with the evaluation criteria
  • Scores closer to 0 indicate poor alignment with the evaluation criteria

This evaluation is particularly useful for:

  • Validating model outputs against specific requirements
  • Ensuring consistent output quality
  • Measuring output effectiveness
  • Identifying areas for improvement in model responses

Best Practices

  1. Define clear and specific evaluation criteria
  2. Include multiple aspects in your criteria
  3. Consider both qualitative and quantitative measures
  4. Align criteria with your use case requirements