Prompt Perplexity

Evaluation Using Interface

Input

Required:
- input: The prompt text to be evaluated.
Configuration Parameters:
- model: The language model (e.g., “gpt-4o-mini”).

Output

Score: - Percentage score between 0 and 100.

Interpretation:

Lower scores: Indicate the prompt is clearer, less surprising, and more predictable for the model, suggesting better comprehension.
Higher scores: Suggest the prompt might be ambiguous, overly complex, or contain unfamiliar concepts, making it harder for the model to process confidently.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input	Parameter	Type	Description
Required Inputs	`input`	`string`	The prompt text to be evaluated.
Config Parameters	`model`	`string`	The language model (e.g., “gpt-4o-mini”).

Output	Type	Description
Score	`float`	Percentage score between 0 and 1.

from fi.testcases import TestCase
from fi.evals.templates import PromptPerplexity

test_case = TestCase(
    input="Can you provide a comprehensive summary of the given text? The summary should cover all the key points and main ideas presented in the original text, while also condensing the information into a concise and easy-to-understand format. Please ensure that the summary includes relevant details and examples that support the main ideas, while avoiding any unnecessary information or repetition.",
)

template = PromptPerplexity(config={"model": "gpt-4o-mini"})

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case], model_name="turing_flash")

print(f"Score: {response.eval_results[0].metrics[0].value[0]}")
print(f"Reason: {response.eval_results[0].reason}")

What to Do When Prompt Perplexity Gives High Score (Lower is Good)

Review the input prompt for clarity, specificity, and simplicity. Ensure it provides sufficient context without being overly complex or ambiguous.
Break down complex prompts into smaller, more manageable parts.
Experiment with different phrasing or formulations of the prompt to see if they yield lower perplexity scores.
Ensure the vocabulary and concepts used are likely within the model’s training data.

Differentiating Prompt Perplexity with Prompt Adherence

While Prompt Perplexity examines the model’s statistical understanding and confidence in processing the input prompt itself, Prompt Adherence focuses on whether the output generated by the model complies with the instructions given in the prompt. Perplexity assesses the clarity of the input, whereas Adherence assesses the compliance of the output.

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Evaluation Using Interface

Evaluation Using Python SDK

What to Do When Prompt Perplexity Gives High Score (Lower is Good)

Differentiating Prompt Perplexity with Prompt Adherence

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using Python SDK

​What to Do When Prompt Perplexity Gives High Score (Lower is Good)

​Differentiating Prompt Perplexity with Prompt Adherence

Evaluation Using Interface

Evaluation Using Python SDK

What to Do When Prompt Perplexity Gives High Score (Lower is Good)

Differentiating Prompt Perplexity with Prompt Adherence