Overview
Conversation Coherence
Evaluate logical flow and context maintenance in dialogues
Conversation Resolution
Assess if conversations reach satisfactory conclusions
Deterministic Eval
Perform rule-based evaluations with predictable outcomes
Content Moderation
Screen content for safety and appropriateness
Context Adherence
Verify responses stay within provided context
Context Relevance
Evaluate if context is sufficient for queries
Completeness
Verify comprehensive query addressing
Context Similarity
Compare similarity between contexts
PII Detection
Identify personal information in content
Toxicity
Detect harmful or toxic content
Tone
Analyze emotional tone of content
Sexist Content
Detect gender bias and discrimination
Prompt Injection
Identify malicious prompt manipulation
Not Gibberish
Verify text coherence and meaning
Safe for Work
Ensure workplace appropriate content
Instruction Adherence
Verify compliance with given instructions
Data Privacy
Ensure compliance with privacy standards
JSON Validation
Verify valid JSON format
Regex
Pattern matching validation
API Call
Validate API response handling
Custom Code
Execute custom evaluation logic
LLM Judge
Language model-based evaluation
Agent Judge
AI agent-based assessment
JSON Schema
Verify JSON structure compliance
Context Sufficiency
Assess if context meets requirements
Grading Criteria
Define and apply evaluation standards
Groundedness
Verify responses are based on context
Summarization Accuracy
Assess summary quality and precision
Answer Similarity
Compare response similarities
Eval Output
Assess evaluation result quality
Context Retrieval
Evaluate context fetching quality
Eval Ranking
Assess response ranking accuracy
Image Instruction
Evaluate image generation compliance
Score Eval
Assess numerical scoring accuracy
Summary Quality
Evaluate summarization effectiveness
Factual Accuracy
Verify factual correctness
Translation Accuracy
Assess translation quality
Cultural Sensitivity
Evaluate cultural appropriateness
Bias Detection
Identify various forms of bias
LLM Function Calling
Assess function call handling
Length Evals
Validate text length requirements
Contain Evals
Check for specific content patterns
Prompt Perplexity
Measure prompt complexity
Chunk Attribution
Track content source attribution
Chunk Utilization
Assess context chunk usage
Valid Links
Verify URL validity
Email Validation
Validate email format