Conversation Coherence

Evaluate logical flow and context maintenance in dialogues

Conversation Resolution

Assess if conversations reach satisfactory conclusions

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Content Moderation

Screen content for safety and appropriateness

Context Adherence

Verify responses stay within provided context

Context Relevance

Evaluate if context is sufficient for queries

Completeness

Verify comprehensive query addressing

Context Similarity

Compare similarity between contexts

PII Detection

Identify personal information in content

Toxicity

Detect harmful or toxic content

Tone

Analyze emotional tone of content

Sexist Content

Detect gender bias and discrimination

Prompt Injection

Identify malicious prompt manipulation

Not Gibberish

Verify text coherence and meaning

Safe for Work

Ensure workplace appropriate content

Instruction Adherence

Verify compliance with given instructions

Data Privacy

Ensure compliance with privacy standards

JSON Validation

Verify valid JSON format

Regex

Pattern matching validation

API Call

Validate API response handling

Custom Code

Execute custom evaluation logic

LLM Judge

Language model-based evaluation

Agent Judge

AI agent-based assessment

JSON Schema

Verify JSON structure compliance

Context Sufficiency

Assess if context meets requirements

Grading Criteria

Define and apply evaluation standards

Groundedness

Verify responses are based on context

Summarization Accuracy

Assess summary quality and precision

Answer Similarity

Compare response similarities

Eval Output

Assess evaluation result quality

Context Retrieval

Evaluate context fetching quality

Eval Ranking

Assess response ranking accuracy

Image Instruction

Evaluate image generation compliance

Score Eval

Assess numerical scoring accuracy

Summary Quality

Evaluate summarization effectiveness

Factual Accuracy

Verify factual correctness

Translation Accuracy

Assess translation quality

Cultural Sensitivity

Evaluate cultural appropriateness

Bias Detection

Identify various forms of bias

LLM Function Calling

Assess function call handling

Length Evals

Validate text length requirements

Contain Evals

Check for specific content patterns

Prompt Perplexity

Measure prompt complexity

Chunk Attribution

Track content source attribution

Chunk Utilization

Assess context chunk usage

Valid Links

Verify URL validity

Email Validation

Validate email format