RAGAS provides a comprehensive suite of evaluation metrics specifically designed for RAG (Retrieval-Augmented Generation) systems. These metrics help assess various aspects of your RAG pipeline’s performance.

Available Metrics

MetricDescriptionRequired Parameters
RagasAnswerCorrectnessEvaluates if the answer is factually correctexpected_response
response
query
RagasAnswerRelevancyMeasures how relevant the answer is to the queryresponse
context
query
RagasCoherenceEvaluates response coherence and readabilityresponse
RagasConcisenessMeasures how concise and focused the response isresponse
RagasContextPrecisionEvaluates precision of retrieved contextexpected_response
context
query
RagasContextRecallMeasures completeness of retrieved contextexpected_response
context
query
RagasContextRelevancyAssesses relevance of retrieved context to querycontext
query
RagasFaithfulnessMeasures response’s faithfulness to provided contextresponse
context
query
RagasHarmfulnessDetects harmful content in responsesresponse

Configuration

All RAGAS metrics require the following configuration parameter:

ParameterDescriptionRequired
modelThe LLM model to use for evaluationYes

Example Usage

from fi.evals import RagasAnswerCorrectness, EvalClient
from fi.testcases import LLMTestCase

# Initialize the evaluation client
evaluator = EvalClient(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key"
)

# Create a test case
test_case = LLMTestCase(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    expected_response="Paris is the capital city of France.",
    context=["Paris is the capital and largest city of France."]
)

# Initialize the RAGAS evaluator
ragas_eval = RagasAnswerCorrectness(config={
    "model": "gpt-4o-mini"
})

# Run the evaluation
result = evaluator.evaluate(ragas_eval, test_case)
print(result)  # Will return a score between 0 and 1