Evaluation Using Interface

Input

  • Required:
    • input: The prompt text to be evaluated.
  • Configuration Parameters:
    • model: The language model (e.g., “gpt-4o-mini”).

Output

  • Score: - Percentage score between 0 and 100.

Interpretation:

  • Lower scores: Indicate the prompt is clearer, less surprising, and more predictable for the model, suggesting better comprehension.
  • Higher scores: Suggest the prompt might be ambiguous, overly complex, or contain unfamiliar concepts, making it harder for the model to process confidently.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


InputParameterTypeDescription
Required InputsinputstringThe prompt text to be evaluated.
Config ParametersmodelstringThe language model (e.g., “gpt-4o-mini”).
OutputTypeDescription
ScorefloatPercentage score between 0 and 1.
from fi.testcases import TestCase
from fi.evals.templates import PromptPerplexity

test_case = TestCase(
    input="Can you provide a comprehensive summary of the given text? The summary should cover all the key points and main ideas presented in the original text, while also condensing the information into a concise and easy-to-understand format. Please ensure that the summary includes relevant details and examples that support the main ideas, while avoiding any unnecessary information or repetition.",
)

template = PromptPerplexity(config={"model": "gpt-4o-mini"})

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

print(f"Score: {response.eval_results[0].metrics[0].value[0]}")
print(f"Reason: {response.eval_results[0].reason}")


What to Do When Prompt Perplexity Gives High Score (Lower is Good)

  • Review the input prompt for clarity, specificity, and simplicity. Ensure it provides sufficient context without being overly complex or ambiguous.
  • Break down complex prompts into smaller, more manageable parts.
  • Experiment with different phrasing or formulations of the prompt to see if they yield lower perplexity scores.
  • Ensure the vocabulary and concepts used are likely within the model’s training data.

Differentiating Prompt Perplexity with Prompt Adherence

While Prompt Perplexity examines the model’s statistical understanding and confidence in processing the input prompt itself, Prompt Adherence focuses on whether the output generated by the model complies with the instructions given in the prompt. Perplexity assesses the clarity of the input, whereas Adherence assesses the compliance of the output.