Prerequisites
GOOGLE_API_KEY for Gemini. All other cookbooks run entirely locally with no API keys and no network calls.
Cookbooks
01 - Local Metrics
Catch hallucinations, wrong dosages, and contradictions in a medical chatbot — all locally in under one second.
02 - LLM-as-Judge
When heuristics miss paraphrases, use Gemini to judge accuracy with augment=True and custom prompts.
03 - RAG Evaluation
Diagnose where your RAG pipeline fails — is retrieval pulling the wrong docs, or is the LLM hallucinating?
04 - Guardrails
Build a sub-10ms security middleware that blocks jailbreaks, code injection, PII leaks, and secret exposure.
05 - Streaming Safety
Monitor streaming LLM output token-by-token and cut the stream the moment safety thresholds are breached.
06 - AutoEval
Describe your app in plain English and get an auto-configured test pipeline you can export to CI/CD.
07 - Feedback Loop
Store developer corrections in ChromaDB and teach your LLM judge to stop repeating the same mistakes.
08 - Multimodal Judge
Pass images and audio to the LLM judge — verify product descriptions match photos, check transcription accuracy.
Progression
The cookbooks are designed to build on each other:| Stage | Cookbook | What you add |
|---|---|---|
| Start here | 01 Local Metrics | Fast, free, local checks |
| Improve accuracy | 02 LLM-as-Judge | LLM refinement for hard cases |
| Debug your RAG | 03 RAG Evaluation | Separate retrieval vs. generation failures |
| Secure inputs | 04 Guardrails | Block attacks before they reach the LLM |
| Protect in real-time | 05 Streaming | Cut off toxic output mid-stream |
| Automate setup | 06 AutoEval | Generate pipelines from descriptions |
| Learn from mistakes | 07 Feedback Loop | Few-shot feedback for the judge |
| Go multimodal | 08 Multimodal Judge | Images and audio evaluation |