Evaluation Using Interface

Input:

  • Required Inputs:
    • output: The output column generated by running a prompt.

Click here to learn how to create prompt column.

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate that the output closely follows the given instructions.
  • Lower scores: Suggest that the output deviates from the instructions.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • output: string - The output column generated by the model.

Output:

  • Score: float - Returns score between 0 and 1

Interpretation:

  • Higher scores: Indicate that the output closely follows the given instructions.
  • Lower scores: Suggest that the output deviates from the instructions.
from fi.testcase import TestCase
from fi.evals import InstructionAdherence

instruction_eval = InstructionAdherence()

test_case = TestCase(
    output="Paris is the capital of France and is known for the Eiffel Tower.",
)
result = evaluator.evaluate(eval_templates=[instruction_eval], inputs=[test_case])
instruction_score = result.eval_results[0].metrics[0].value


What to Do if Prompt Adherence is Low

Identify specific areas where the output deviates from the given instructions. Providing targeted feedback helps refine the content to better align with the prompt.

Reviewing the prompt for clarity and completeness is essential, as ambiguous or vague instructions may contribute to poor adherence. If necessary, adjusting the prompt to offer clearer guidance can improve response accuracy.

Enhancing the model’s ability to interpret and follow instructions through fine-tuning or prompt engineering can further strengthen adherence.


Differentiating Prompt/Instruction Adherence with Context Adherence

Context Adherence focuses on maintaining information boundaries and verifying sources, ensuring that responses are strictly derived from the given context. Whereas, Prompt Adherence evaluates whether the output correctly follows instructions, completes tasks, and adheres to specified formats.

Their evaluation criteria differ, with Context Adherence checking if information originates from the provided context, while Prompt Adherence ensures that all instructions are followed accurately.