Why Create Custom Evaluations?
While Future AGI offers comprehensive evaluation templates, custom evaluations are essential when you need:- Domain-Specific Validation: Assess content against industry-specific standards or regulations
- Business Rule Compliance: Ensure outputs meet your organization’s unique guidelines
- Complex Scoring Logic: Implement multi-criteria assessments with weighted scoring
- Custom Output Formats: Validate specific response structures or formats unique to your application
Creating Custom Evaluations
Using Web Interface
Step 1: Access Evaluation Creation- Navigate to your dataset in the Future AGI platform
- Click on the Evaluate button in the top-right menu
- Click on Add Evaluation button
- Select Create your own eval
- Name: Enter a unique evaluation name (lowercase letters, numbers, and underscores only)
-
Model Selection: Choose the appropriate model for your evaluation complexity:
- Future AGI Models: Proprietary models optimized for evaluations
- Other LLMs: Use external language models from providers like OpenAI, Anthropic, or your own custom models.
- Rule Prompt: Write the evaluation criteria and instructions
- Use
{{variable_name}}syntax to create dynamic variables that will be mapped to dataset columns - Be specific about what constitutes a pass/fail or scoring criteria
- Pass/Fail: Binary evaluation (1.0 for pass, 0.0 for fail)
- Percentage: Numerical score between 0 and 100
- Deterministic Choices: Select from predefined categorical options
- Tags: Add relevant tags for organization and filtering
- Description: Provide a clear description of the evaluation’s purpose
- Check Internet: Enable web access for real-time information validation
Example: Creating a Chatbot Evaluation
Let’s walk through creating a custom evaluation for a customer service chatbot. This example will show how to ensure the chatbot’s responses are both polite and effectively address user queries.Step 1: Basic Configuration
- Name:
chatbot_politeness_and_relevance - Model Selection:
TURING_SMALL(ideal for straightforward evaluations like this) - Description: “Evaluates if the chatbot’s response is polite and relevant to the user’s query.”
Step 2: Define Evaluation Rules
Create a rule prompt that clearly specifies the evaluation criteria:Step 3: Configure Output
- Output Type:
Pass/Fail(1.0 for pass, 0.0 for fail) - Tags:
customer-service,politeness,relevance
Step 4: Map Variables
In your dataset, map the variables to their corresponding columns:{{user_query}}→ Column containing user questions{{chatbot_response}}→ Column containing chatbot responses
Running the Evaluation
You can either run the evaluation through the web interface or using the SDK.Using Web Interface
- Navigate to your dataset in the Future AGI platform
- Click on the Evaluate button in the top-right menu
- Click on the evaluation you just created
- Configure the columns that you want to use for the evaluation
- Click on the Add & Run button