Eval Definition
Prompt Perplexity
Prompt Perplexity measures how well a language model predicts the tokens in a given input prompt. It’s calculated based on the likelihood the model assigns to each token in the prompt.
Evaluation Using Interface
Input
- Required:
input
: The prompt text to be evaluated.
- Configuration Parameters:
model
: The language model (e.g., “gpt-4o-mini”).
Output
- Score: - Percentage score between 0 and 100.
Interpretation:
- Lower scores: Indicate the prompt is clearer, less surprising, and more predictable for the model, suggesting better comprehension.
- Higher scores: Suggest the prompt might be ambiguous, overly complex, or contain unfamiliar concepts, making it harder for the model to process confidently.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input | Parameter | Type | Description |
---|---|---|---|
Required Inputs | input | string | The prompt text to be evaluated. |
Config Parameters | model | string | The language model (e.g., “gpt-4o-mini”). |
Output | Type | Description |
---|---|---|
Score | float | Percentage score between 0 and 1. |
What to Do When Prompt Perplexity Gives High Score (Lower is Good)
- Review the input prompt for clarity, specificity, and simplicity. Ensure it provides sufficient context without being overly complex or ambiguous.
- Break down complex prompts into smaller, more manageable parts.
- Experiment with different phrasing or formulations of the prompt to see if they yield lower perplexity scores.
- Ensure the vocabulary and concepts used are likely within the model’s training data.
Differentiating Prompt Perplexity with Prompt Adherence
While Prompt Perplexity examines the model’s statistical understanding and confidence in processing the input prompt itself, Prompt Adherence focuses on whether the output generated by the model complies with the instructions given in the prompt. Perplexity assesses the clarity of the input, whereas Adherence assesses the compliance of the output.
Was this page helpful?