Creating custom evaluations allows you to tailor assessment criteria to your specific use case and business requirements. Future AGI provides flexible tools to build evaluations that go beyond standard templates, enabling you to define custom rules, scoring mechanisms, and validation logic.
turing_flash
: Flagship evaluation model that delivers best-in-class accuracy across multimodal inputs (text, images, audio). Recommended when maximal precision outweighs latency constraints.
turing_small
: Compact variant that preserves high evaluation fidelity while lowering computational cost. Supports text and image evaluations.
turing_flash
: Latency-optimised version of TURING, providing high-accuracy assessments for text and image inputs with fast response times.
protect
: Real-time guardrailing model for safety, policy compliance, and content-risk detection. Offers very low latency on text and audio streams and permits user-defined rule sets.
protect_flash
: Ultra-fast binary guardrail for text content. Designed for first-pass filtering where millisecond-level turnaround is critical.
{{variable_name}}
syntax to create dynamic variables that will be mapped to dataset columnschatbot_politeness_and_relevance
TURING_SMALL
(ideal for straightforward evaluations like this)Pass/Fail
(1.0 for pass, 0.0 for fail)customer-service
, politeness
, relevance
{{user_query}}
→ Column containing user questions{{chatbot_response}}
→ Column containing chatbot responses