Introduction
Evaluation
- Overview
- Quickstart
- Concept
- How To
- Eval Definition
- Overview
- Conversation Coherence
- Conversation Resolution
- Deterministic Eval
- Content Moderation
- Context Adherence
- Context Relevance
- Completeness
- Context Similarity
- PII
- Toxicity
- Tone
- Sexist
- Prompt Injection
- Not Gibberish
- Safe for Work Text Eval
- Prompt/Instruction Adherence
- Data Privacy Compliance
- Is JSON
- Regex
- API Call
- Custom Code
- Agent as a Judge
- JSON Schema Validation
- Groundedness
- Answer Similarity
- Eval Output
- Eval Context Retrieval
- Eval Ranking
- Eval Image Instruction
- Score Eval
- Summary Quality
- Factual Accuracy
- Translation Accuracy
- Cultural Sensitivity
- Bias Detection
- LLM Function Calling
- Length Evals
- Contain
- Prompt Perplexity
- Chunk Attribution
- Chunk Utilization
- Valid Links
- Is Email
- Audio Quality
- Eval Audio Description
- Audio Transcription
Dataset
- Overview
- Concept
- Adding Dataset
- Create Dynamic Column
- Add Annotations
- Change Column Type
- Create Static Column
- Create Synthetic Data
- Experimentation
Tracing
- Overview
- Concept
- Instrumentation ( Auto )
- Manual Tracing
Admin & Settings
Overview
Audio Quality
Evaluates the quality of the given audio
Eval Audio Description
Evaluates the audio based on the description of the given audio
Audio Transcription
Analyzes the transcription accuracy of the given audio and its transcription
Chunk Utilization
Assess context chunk usage
Chunk Attribution
Track content source attribution
LLM Function Calling
Assess function call handling
Context Retrieval
Evaluate context fetching quality
Instruction Adherence
Verify compliance with given instructions
Sexist Content
Detect gender bias and discrimination
Tone
Analyze emotional tone of content
Toxicity
Detect harmful or toxic content
Context Relevance
Evaluate if context is sufficient for queries
Context Adherence
Verify responses stay within provided context
Conversation Coherence
Evaluate logical flow and context maintenance in dialogues
Deterministic Eval
Perform rule-based evaluations with predictable outcomes
Data Privacy
Ensure compliance with privacy standards
Bias Detection
Identify various forms of bias
Cultural Sensitivity
Evaluate cultural appropriateness
Translation Accuracy
Assess translation quality
Factual Accuracy
Verify factual correctness
Summary Quality
Evaluate summarization effectiveness
Score Eval
Assess numerical scoring accuracy
Image Instruction
Evaluate image generation compliance
Eval Ranking
Assess response ranking accuracy
Eval Output
Assess evaluation result quality
Groundedness
Verify responses are based on context
Completeness
Verify comprehensive query addressing
Prompt Injection
Identify malicious prompt manipulation
Chunk Utilization
Assess context chunk usage
Chunk Attribution
Track content source attribution
Prompt Perplexity
Measure prompt complexity
LLM Function Calling
Assess function call handling
Deterministic Eval
Perform rule-based evaluations with predictable outcomes
Bias Detection
Identify various forms of bias
Cultural Sensitivity
Evaluate cultural appropriateness
Translation Accuracy
Assess translation quality
Factual Accuracy
Verify factual correctness
Score Eval
Assess numerical scoring accuracy
Groundedness
Verify responses are based on context
Completeness
Verify comprehensive query addressing
Context Retrieval
Evaluate context fetching quality
Agent as a Judge
Use AI agents to evaluate content
Deterministic Eval
Perform rule-based evaluations with predictable outcomes
Score Eval
Assess numerical scoring accuracy
Image Instruction
Evaluate image generation compliance
Eval Ranking
Assess response ranking accuracy
Eval Output
Assess evaluation result quality
Answer Similarity
Compare response similarities
Contain Evals
Check if content contains specific elements
Valid Links
Verify URL validity
Length Evals
Validate text length requirements
Custom Code
Execute custom evaluation logic
JSON Schema Validation
Verify valid JSON format
API Call
Validate API response handling
Regex
Validate text against regular expressions
Is JSON
Validate if text is valid JSON
Chunk Utilization
Assess context chunk usage
Chunk Attribution
Track content source attribution
Context Retrieval
Evaluate context fetching quality
Context Relevance
Evaluate if context is sufficient for queries
Context Adherence
Verify responses stay within provided context
Context Similarity
Compare similarity between contexts
Eval Ranking
Assess response ranking accuracy
Groundedness
Verify responses are based on context
Completeness
Verify comprehensive query addressing
Sexist
Detect gender bias and discrimination
Tone
Analyze emotional tone of content
Toxicity
Detect harmful or toxic content
Data Privacy
Ensure compliance with privacy standards
Safe for Work
Ensure workplace appropriate content
Not Gibberish
Verify text coherence and meaning
PII
Identify personal information in content
Content Moderation
Screen content for safety and appropriateness
Prompt Injection
Identify malicious prompt manipulation
Prompt Perplexity
Measure prompt complexity
Instruction Adherence
Verify compliance with given instructions
Data Privacy Compliance
Ensure compliance with data privacy standards
Bias Detection
Identify various forms of bias
Score Eval
Assess numerical scoring accuracy
Eval Image Instruction
Evaluate image generation compliance
Deterministic Eval
Perform rule-based evaluations with predictable outcomes
Data Privacy Compliance
Ensure compliance with data privacy standards
Bias Detection
Identify various forms of bias
Translation Accuracy
Assess translation quality
Factual Accuracy
Verify factual correctness
Summary Quality
Evaluate summarization effectiveness
Score Eval
Assess numerical scoring accuracy
Content Moderation
Screen content for safety and appropriateness
Audio Quality
Evaluates the quality of the given audio
Eval Audio Description
Evaluates the audio based on the description of the given audio
Audio Transcription
Analyzes the transcription accuracy of the given audio and its transcription
Sexist
Detects sexist content and gender bias
Tone
Analyzes the tone and sentiment of content
Toxicity
Evaluates content for toxic or harmful language
Conversation Resolution
Checks if the conversation reaches a satisfactory conclusion or resolution
Deterministic Eval
Evaluates if the output is deterministic or not
Data Privacy
Checks output for compliance with data privacy regulations
Bias Detection
Identifies various forms of bias in the output
Cultural Sensitivity
Analyzes output for cultural appropriateness and inclusive language
Groundedness
Evaluates if the response is grounded in the provided context
Completeness
Evaluates if the response completely answers the query
Prompt Injection
Evaluates text for potential prompt injection attempts
Was this page helpful?