Evaluations
Quickstart
Quickstart for Evaluations
Installation
First, install the Future AGI Python client:
Initialize the Client (using API keys)
Initialize the Client (using environment variables)
Running Your First Evaluation
Here’s a simple example of how to evaluate the safety of a model’s response:
Available Evaluation Types
Future AGI provides several categories of evaluations:
Conversation Evaluations
- Conversation Coherence
- Conversation Resolution
Function-based Evaluations
- Safety Checks
- Text Validation
- Custom Evaluations
LLM-based Evaluations
- Response Faithfulness
- Does Response Answer the Question?
- Context Relevancy
RAGAS Evaluations
- RAGAS Answer Correctness
- RAGAS Context Precision
- RAGAS Faithfulness
FutureAGI Evaluations
- Bias Detection
- Toxicity Analysis
- Factual Accuracy
Grounded Evaluations
- Answer Similarity
- Context Similarity
Example: Evaluating Response Faithfulness
Here’s a more complex example that evaluates whether a response is faithful to the provided context:
Example: Image Content Evaluation
Here’s an example that evaluates whether an image contains specific content: