Evaluation Using Interface

Input:

  • Configuration Parameters:
    • code: A string containing the custom Python code to execute. This code must define a function main(**kwargs), where kwargs will be populated with the values from the corresponding dataset row/columns. The function should return the evaluation result (e.g., a score, boolean).

    • Example Code Structure:

      def main(**kwargs):
          # Access column 'input_col' via kwargs['input_col']
          # Access column 'output_col' via kwargs['output_col']
          input_val = kwargs.get('input_col', '')
          output_val = kwargs.get('output_col', '')
      
          # Implement custom logic
          if 'expected pattern' in output_val and len(input_val) > 10:
              return 1.0 # Represents Pass or high score
          else:
              return 0.0 # Represents Fail or low score
      
      

Output: The value returned by the custom main function.


What to do when Custom Code Eval Fails

Do code review for checking syntax errors, verifying that the function is correctly implemented, and ensuring all required dependencies are available. Input validation ensures that all necessary arguments are properly accessed and that input data types and formats align with expected requirements.


Differentiating Custom Code Eval with Deterministic Eval

Deterministic Evals and Custom Code Eval share flexibility and customisation capabilities, allowing for tailored evaluation logic. Both can be configured for different types of outputs, with Deterministic Evals utilising rule prompts to guide evaluations.

However, Custom Code Eval executes actual Python code, enabling dynamic computations and logic, while Deterministic Evals rely on structured, rule-based evaluation methods.