Skip to main content
These cookbooks walk you through the AI Evaluation SDK from zero to production. Each one solves a concrete problem — hallucination detection, RAG debugging, prompt injection defense, streaming safety, and more — with runnable code you can copy straight into your project.

Prerequisites

pip install ai-evaluation
Cookbooks that use an LLM judge (02, 08, 09) require a GOOGLE_API_KEY for Gemini. All other cookbooks run entirely locally with no API keys and no network calls.

Cookbooks

Progression

The cookbooks are designed to build on each other:
StageCookbookWhat you add
Start here01 Local MetricsFast, free, local checks
Improve accuracy02 LLM-as-JudgeLLM refinement for hard cases
Debug your RAG03 RAG EvaluationSeparate retrieval vs. generation failures
Secure inputs04 GuardrailsBlock attacks before they reach the LLM
Protect in real-time05 StreamingCut off toxic output mid-stream
Automate setup06 AutoEvalGenerate pipelines from descriptions
Learn from mistakes07 Feedback LoopFew-shot feedback for the judge
Go multimodal08 Multimodal JudgeImages and audio evaluation