Using the Future AGI Python SDK for running evaluations, listing available evaluators, and configuring the Evaluator client.
Evaluator
class to programmatically run evaluations on your data and language model outputs. This document details its usage based on the provided SDK snippets.
Evaluator
Evaluator
client. API keys and base URL can be provided directly or will be read from environment variables (FI_API_KEY
, FI_SECRET_KEY
, FI_BASE_URL
) if not specified.
fi_api_key
(Optional[str], optional): API key. Defaults to None.fi_secret_key
(Optional[str], optional): Secret key. Defaults to None.fi_base_url
(Optional[str], optional): Base URL. Defaults to None.**kwargs
:
timeout
(Optional[int]): Timeout value in seconds. Default: 200
.max_queue_bound
(Optional[int]): Maximum queue size. Default: 5000
.max_workers
(Optional[int]): Maximum number of workers. Default: 8
.evaluate
eval_templates
(Union[str, EvalTemplate, List[EvalTemplate]]): A single evaluation template or a list of evaluation templates.inputs
(Union[TestCase, List[TestCase], Dict[str, Any], List[Dict[str, Any]]): A single test case or a list of test cases. Supports various TestCase
types.timeout
(Optional[int], optional): Timeout value in seconds for the evaluation. Defaults to None (uses the client’s default timeout).model_name
(Optional[str], optional): Model name to use for the evaluation while using Future AGI Built Evals. Defaults to None.BatchRunResult
: An object containing the results of the evaluation(s).ValidationError
: If the inputs do not match the evaluation templates.Exception
: If the API request fails or other errors occur during evaluation.list_evaluations
List[Dict[str, Any]]
: A list of dictionaries, where each dictionary contains information about an available evaluation template. This typically includes details like the template’s id
, name
, description
, and expected parameters.eval_templates