Effective prompt design is essential for ensuring clear, precise, and contextually relevant AI-generated responses. Ambiguous or overly complex prompts can lead to inconsistent model behaviour, reduced response quality, and difficulty in refining AI interactions.

Challenges associated with ambiguous prompts include:

  • Inconsistent Model Responses – The same prompt may yield unpredictable or varying outputs.
  • Reduced Comprehensibility – A vague or complex prompt can lead to off-topic, incomplete, or irrelevant responses.
  • Challenges in Optimization – Poorly structured prompts hinder effective prompt engineering and fine-tuning efforts.

To address these issues, Prompt Perplexity Evaluation measures the clarity and interpretability of a given prompt by assessing the model’s ability to generate a coherent and confident response.

This eval quantifies how well a language model understands and processes an input prompt by computing its perplexity score.

  • Lower Perplexity → The prompt is clear and easy for the model to interpret, leading to consistent and accurate responses.
  • Higher Perplexity → The prompt is ambiguous, overly complex, or lacks sufficient context, making it difficult for the model to generate coherent responses.

Click here to read the eval definition of Prompt Perplexity


a. Using Interface

Required Inputs

  • input: The prompt to be evaluated.

Configuration Parameters

  • model: The language model used to compute the perplexity score.

Output

  • Numeric perplexity score, where lower values indicate higher prompt clarity and interpretability.

b. Using SDK

from fi.testcases import TestCase
from fi.evals.templates import PromptPerplexity

test_case = TestCase(
    input='''
    Can you provide a comprehensive summary of the given text? 
    The summary should cover all the key points and main ideas 
    presented in the original text, while also condensing the 
    information into a concise and easy-to-understand format. 
    Please ensure that the summary includes relevant details and 
    examples that support the main ideas, while avoiding any unnecessary 
    information or repetition.
    '''
)

template = PromptPerplexity(config={"model": "gpt-4o-mini"})

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

print(f"Score: {response.eval_results[0].metrics[0].value[0]}")
print(f"Reason: {response.eval_results[0].reason}")