Future AGI home pagelight logodark logo
Introduction
  • What is Future AGI?
Evaluation
  • Overview
  • Quickstart
  • Concept
  • How To
  • Eval Definition
    • Overview
    • Conversation Coherence
    • Conversation Resolution
    • Deterministic Eval
    • Content Moderation
    • Context Adherence
    • Context Relevance
    • Completeness
    • Context Similarity
    • PII
    • Toxicity
    • Tone
    • Sexist
    • Prompt Injection
    • Not Gibberish
    • Safe for Work Text Eval
    • Prompt/Instruction Adherence
    • Data Privacy Compliance
    • Is JSON
    • Regex
    • API Call
    • Custom Code
    • Agent as a Judge
    • JSON Schema Validation
    • Groundedness
    • Answer Similarity
    • Eval Output
    • Eval Context Retrieval
    • Eval Ranking
    • Eval Image Instruction
    • Score Eval
    • Summary Quality
    • Factual Accuracy
    • Translation Accuracy
    • Cultural Sensitivity
    • Bias Detection
    • LLM Function Calling
    • Length Evals
    • Contain
    • Prompt Perplexity
    • Chunk Attribution
    • Chunk Utilization
    • Valid Links
    • Is Email
    • Audio Quality
    • Eval Audio Description
    • Audio Transcription
Knowledge Base
  • Overview
  • Concept
  • How To
Dataset
  • Overview
  • Concept
  • Adding Dataset
  • Create Dynamic Column
  • Add Annotations
  • Change Column Type
  • Create Static Column
  • Create Synthetic Data
  • Experimentation
Prototype
  • Overview
  • Quickstart
  • Evals for Prototype
  • Choose Winner
Observe
  • Overview
  • Quickstart
  • How to run evals?
  • Sessions
  • Alerts and Monitors
Tracing
  • Overview
  • Concept
  • Instrumentation ( Auto )
  • Manual Tracing
Optimization
  • Overview
  • Concept
  • How To
Prompt Workbench
  • Overview
  • Concept
  • How To
Protect
  • Overview
  • Concept
  • How to Use Future AGI Protect
MCP
  • MCP Server
Admin & Settings
  • Administration Panel
FAQs
  • Frequently Asked Questions (FAQ)
  • Community
  • Sign Up
Future AGI home pagelight logodark logo
  • Community
  • Sign Up
  • Sign Up
Eval Definition
Overview
Documentation
Cookbooks
Release Notes
SDK Reference
Documentation
Cookbooks
Release Notes
SDK Reference
Eval Definition

Overview

Conversation Coherence

Evaluate logical flow and context maintenance in dialogues

Conversation Resolution

Assess if conversations reach satisfactory conclusions

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Score Eval

Assess numerical scoring accuracy

Image Instruction

Evaluate image generation compliance

Audio Quality

Evaluates the quality of the given audio

Eval Audio Description

Evaluates the audio based on the description of the given audio

Audio Transcription

Analyzes the transcription accuracy of the given audio and its transcription

Chunk Utilization

Assess context chunk usage

Chunk Attribution

Track content source attribution

LLM Function Calling

Assess function call handling

Context Retrieval

Evaluate context fetching quality

Instruction Adherence

Verify compliance with given instructions

Sexist Content

Detect gender bias and discrimination

Tone

Analyze emotional tone of content

Toxicity

Detect harmful or toxic content

Context Relevance

Evaluate if context is sufficient for queries

Context Adherence

Verify responses stay within provided context

Conversation Coherence

Evaluate logical flow and context maintenance in dialogues

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Data Privacy

Ensure compliance with privacy standards

Bias Detection

Identify various forms of bias

Cultural Sensitivity

Evaluate cultural appropriateness

Translation Accuracy

Assess translation quality

Factual Accuracy

Verify factual correctness

Summary Quality

Evaluate summarization effectiveness

Score Eval

Assess numerical scoring accuracy

Image Instruction

Evaluate image generation compliance

Eval Ranking

Assess response ranking accuracy

Eval Output

Assess evaluation result quality

Groundedness

Verify responses are based on context

Completeness

Verify comprehensive query addressing

Prompt Injection

Identify malicious prompt manipulation

Chunk Utilization

Assess context chunk usage

Chunk Attribution

Track content source attribution

Prompt Perplexity

Measure prompt complexity

LLM Function Calling

Assess function call handling

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Bias Detection

Identify various forms of bias

Cultural Sensitivity

Evaluate cultural appropriateness

Translation Accuracy

Assess translation quality

Factual Accuracy

Verify factual correctness

Score Eval

Assess numerical scoring accuracy

Groundedness

Verify responses are based on context

Completeness

Verify comprehensive query addressing

Context Retrieval

Evaluate context fetching quality

Agent as a Judge

Use AI agents to evaluate content

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Score Eval

Assess numerical scoring accuracy

Image Instruction

Evaluate image generation compliance

Eval Ranking

Assess response ranking accuracy

Eval Output

Assess evaluation result quality

Answer Similarity

Compare response similarities

Contain Evals

Check if content contains specific elements

Valid Links

Verify URL validity

Length Evals

Validate text length requirements

Custom Code

Execute custom evaluation logic

JSON Schema Validation

Verify valid JSON format

API Call

Validate API response handling

Regex

Validate text against regular expressions

Is JSON

Validate if text is valid JSON

Chunk Utilization

Assess context chunk usage

Chunk Attribution

Track content source attribution

Context Retrieval

Evaluate context fetching quality

Context Relevance

Evaluate if context is sufficient for queries

Context Adherence

Verify responses stay within provided context

Context Similarity

Compare similarity between contexts

Eval Ranking

Assess response ranking accuracy

Groundedness

Verify responses are based on context

Completeness

Verify comprehensive query addressing

Sexist

Detect gender bias and discrimination

Tone

Analyze emotional tone of content

Toxicity

Detect harmful or toxic content

Data Privacy

Ensure compliance with privacy standards

Safe for Work

Ensure workplace appropriate content

Not Gibberish

Verify text coherence and meaning

PII

Identify personal information in content

Content Moderation

Screen content for safety and appropriateness

Prompt Injection

Identify malicious prompt manipulation

Prompt Perplexity

Measure prompt complexity

Instruction Adherence

Verify compliance with given instructions

Data Privacy Compliance

Ensure compliance with data privacy standards

Bias Detection

Identify various forms of bias

Score Eval

Assess numerical scoring accuracy

Eval Image Instruction

Evaluate image generation compliance

Deterministic Eval

Perform rule-based evaluations with predictable outcomes

Data Privacy Compliance

Ensure compliance with data privacy standards

Bias Detection

Identify various forms of bias

Translation Accuracy

Assess translation quality

Factual Accuracy

Verify factual correctness

Summary Quality

Evaluate summarization effectiveness

Score Eval

Assess numerical scoring accuracy

Content Moderation

Screen content for safety and appropriateness

Audio Quality

Evaluates the quality of the given audio

Eval Audio Description

Evaluates the audio based on the description of the given audio

Audio Transcription

Analyzes the transcription accuracy of the given audio and its transcription

Sexist

Detects sexist content and gender bias

Tone

Analyzes the tone and sentiment of content

Toxicity

Evaluates content for toxic or harmful language

Conversation Resolution

Checks if the conversation reaches a satisfactory conclusion or resolution

Deterministic Eval

Evaluates if the output is deterministic or not

Data Privacy

Checks output for compliance with data privacy regulations

Bias Detection

Identifies various forms of bias in the output

Cultural Sensitivity

Analyzes output for cultural appropriateness and inclusive language

Groundedness

Evaluates if the response is grounded in the provided context

Completeness

Evaluates if the response completely answers the query

Prompt Injection

Evaluates text for potential prompt injection attempts

Was this page helpful?

Previous
Conversation CoherenceEvaluates how logically a conversation flows and maintains context throughout the dialogue. This metric assesses whether responses are consistent, contextually appropriate, and maintain a natural progression of ideas within the conversation thread.
Next
Powered by Mintlify