Installation

First, install the Future AGI Python client:

pip install futureagi

Initialize the Client (using API keys)

from fi.evals import EvalClient
evaluator = EvalClient(
    fi_api_key="your_api_key", # Optional, defaults to environment variable
    fi_secret_key="your_secret_key", # Optional, defaults to environment variable
    fi_api_url="https://api.futureagi.com" # Optional, defaults to environment variable
)

Initialize the Client (using environment variables)

export FI_API_KEY="your_api_key"
export FI_SECRET_KEY="your_secret_key"
export FI_API_URL="https://api.futureagi.com"
from fi.evals import EvalClient
evaluator = EvalClient()

Running Your First Evaluation

Here’s a simple example of how to evaluate the safety of a model’s response:

from fi.evals import SafeForWorkText
from fi.testcases import TestCase


# Create a test case
test_case = TestCase(
    text="This is a sample response to evaluate."
)

# Initialize the safety evaluator
safety_eval = SafeForWorkText()

# Run the evaluation
result = evaluator.evaluate(safety_eval, test_case)
print(result) # Will return Pass or Fail

Available Evaluation Types

Future AGI provides several categories of evaluations:

Example: Evaluating Response Faithfulness

Here’s a more complex example that evaluates whether a response is faithful to the provided context:

from fi.evals import ResponseFaithfulness
from fi.testcases import LLMTestCase

# Create a test case
test_case = LLMTestCase(
    response="The capital of France is Paris, which is known as the City of Light.",
    context="Paris is the capital city of France. It is often called 'La Ville Lumière' (the City of Light)."
)

# Initialize the faithfulness evaluator
faithfulness = ResponseFaithfulness(config={
    "model": "gpt-4o-mini"
})

# Run the evaluation
result = evaluator.evaluate(faithfulness, test_case)

print(result) # Will return Pass if response is faithful to context

Example: Image Content Evaluation

Here’s an example that evaluates whether an image contains specific content:

from fi.evals import Deterministic, EvalClient
from fi.testcases import TestCase

# Define test case class with image URLs
class ImageDeterministicTestCase(TestCase):
    image_url: str
    expected_label: str

# Initialize the deterministic evaluator
deterministic_eval = Deterministic(config={
    "multi_choice": False,
    "choices": ["Yes", "No"],
    "rule_prompt": "Does the image at {{input_key1}} contain a person? Compare with expected answer {{input_key2}}",
    "input": {
        "input_key1": "image_url", 
        "input_key2": "expected_label"
    }
})

# Create a test case
test_case = ImageDeterministicTestCase(
    image_url="https://example.com/person.jpg",
    expected_label="Yes"
)

# Initialize the evaluation client
evaluator = EvalClient(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key"
)

# Run the evaluation
result = evaluator.evaluate(deterministic_eval, test_case)
print(result)  # Will return Yes or No