Cookbooks
How to evaluate RAG Applications
Retreival Augmented Generation Evaluation using Future AGI
Step 1 - Install necessary packages and making necessary imports
Step 2 - Load the dataset and select an instance of the dataset
question | correct_answer | incorrect_answer | question_id | generated_with_rag | context | generated_without_rag |
---|---|---|---|---|---|---|
HOW AFRICAN AMERICANS WERE IMMIGRATED TO THE US | As such, African immigrants are to be distinguished… | From the Immigration and Nationality Act of 19… | Q0 | African Americans were immigrated to the United… | [African immigration to the United States refers… | African Americans were immigrated to the US in… |
what are points on a mortgage | Points, sometimes also called a “discount point”… | Discount points may be different from originating… | Q1012 | Points on a mortgage are a form of pre-paid… | [Discount points, also called mortgage points… | A mortgage point is a fee equal to 1% of the l… |
how does interlibrary loan work | The user makes a request with their local library… | Although books and journal articles are the most… | Q102 | Interlibrary loan works by allowing patrons… | [Interlibrary loan (abbreviated ILL, and sometimes… | Interlibrary loan is a service that allows lib… |
WHAT IS A FY QUARTER | A fiscal year (or financial year, or sometimes… | Fiscal years vary between businesses and countries… | Q1027 | A FY quarter is a three-month period within… | [April.\n\n\n=== United States ===\n\n\n==== F… | A FY Quarter is a three-month period in the fi… |
who wrote a rose is a rose is a rose | The sentence “Rose is a rose is a rose is a rose”… | I know that in daily life we don’t go around saying… | Q1032 | Gertrude Stein wrote the sentence “A rose is… | [The sentence “Rose is a rose is a rose is a rose”… | Gertrude Stein wrote “A Rose is a Rose is a Rose…” |
Step 3 - Choose the evaluations you want to perform
Available RAG evaluations in Future AGI :
Context Adherence
- Description: Ensures that responses remain within the provided context, avoiding information not present in the retrieved data.
- Key Points: Focuses on detecting hallucinations and ensuring factual consistency.
Context Relevance
- Description: Assesses how well the retrieved context aligns with the query.
- Key Points: Determines sufficiency of context to address the input.
Completeness
- Description: Evaluates whether the response fully answers the query.
- Key Points: Focuses on providing comprehensive and accurate answers.
Chunk Attribution
- Description: Tracks which context chunks are used in generating responses.
- Key Points: Highlights which parts of the context contribute to the response.
Chunk Utilization
- Description: Measures the effective usage of context chunks in generating responses.
- Key Points: Indicates the level of relevance and reliance on the provided context.
Context Similarity
- Description: Compares the provided context with expected context using similarity metrics.
- Key Points: Uses techniques like cosine similarity and Jaccard index for comparison.
Groundedness
- Description: Ensures that the response is strictly grounded in the provided context.
- Key Points: Verifies factual reliance on retrieved information.
Summarization Accuracy
- Description: Evaluates the accuracy of a summary against the original document.
- Key Points: Ensures faithfulness to the source material.
Eval Context Retrieval Quality
- Description: Assesses the quality and adequacy of the retrieved context.
- Key Points: Measures sufficiency and relevance of the retrieved information.
Eval Ranking
- Description: Provides ranking scores for contexts based on relevance and criteria.
- Key Points: Prioritizes contexts that best align with the query.
Step 5 - Create an object of the chosen evaluator(s)
Step 6 - Initialize the EvalClient and run evaluations
Step 7 - Aggregate the results