Why Create Custom Evaluations?
While Future AGI offers comprehensive evaluation templates, custom evaluations are essential when you need:- Domain-Specific Validation: Assess content against industry-specific standards or regulations
- Business Rule Compliance: Ensure outputs meet your organization’s unique guidelines
- Complex Scoring Logic: Implement multi-criteria assessments with weighted scoring
- Custom Output Formats: Validate specific response structures or formats unique to your application
Creating Custom Evaluations
Using Web Interface
Step 1: Access Evaluation Creation- Navigate to your dataset in the Future AGI platform
- Click on the Evaluate button in the top-right menu
- Click on Add Evaluation button
- Select Create your own eval
- Name: Enter a unique evaluation name (lowercase letters, numbers, and underscores only)
-
Model Selection: Choose the appropriate model for your evaluation complexity:
- Future AGI Models: Proprietary models optimized for evaluations
-
TURING_LARGE
turing_flash
: Flagship evaluation model that delivers best-in-class accuracy across multimodal inputs (text, images, audio). Recommended when maximal precision outweighs latency constraints. -
TURING_SMALL
turing_small
: Compact variant that preserves high evaluation fidelity while lowering computational cost. Supports text and image evaluations. -
TURING_FLASH
turing_flash
: Latency-optimised version of TURING, providing high-accuracy assessments for text and image inputs with fast response times. -
PROTECT
protect
: Real-time guardrailing model for safety, policy compliance, and content-risk detection. Offers very low latency on text and audio streams and permits user-defined rule sets. -
PROTECT_FLASH
protect_flash
: Ultra-fast binary guardrail for text content. Designed for first-pass filtering where millisecond-level turnaround is critical.
- Other LLMs: Use external language models from providers like OpenAI, Anthropic, or your own custom models.
Click here to learn how to add custom models.
- Rule Prompt: Write the evaluation criteria and instructions
- Use
{{variable_name}}
syntax to create dynamic variables that will be mapped to dataset columns - Be specific about what constitutes a pass/fail or scoring criteria
- Pass/Fail: Binary evaluation (1.0 for pass, 0.0 for fail)
- Percentage: Numerical score between 0 and 100
- Deterministic Choices: Select from predefined categorical options
- Tags: Add relevant tags for organization and filtering
- Description: Provide a clear description of the evaluation’s purpose
- Check Internet: Enable web access for real-time information validation
Example: Creating a Chatbot Evaluation
Let’s walk through creating a custom evaluation for a customer service chatbot. This example will show how to ensure the chatbot’s responses are both polite and effectively address user queries.Step 1: Basic Configuration
- Name:
chatbot_politeness_and_relevance
- Model Selection:
TURING_SMALL
(ideal for straightforward evaluations like this) - Description: “Evaluates if the chatbot’s response is polite and relevant to the user’s query.”
Step 2: Define Evaluation Rules
Create a rule prompt that clearly specifies the evaluation criteria:Step 3: Configure Output
- Output Type:
Pass/Fail
(1.0 for pass, 0.0 for fail) - Tags:
customer-service
,politeness
,relevance
Step 4: Map Variables
In your dataset, map the variables to their corresponding columns:{{user_query}}
→ Column containing user questions{{chatbot_response}}
→ Column containing chatbot responses
Running the Evaluation
You can either run the evaluation through the web interface or using the SDK.Using Web Interface
- Navigate to your dataset in the Future AGI platform
- Click on the Evaluate button in the top-right menu
- Click on the evaluation you just created
- Configure the columns that you want to use for the evaluation
- Click on the Add & Run button