| Input | |||
|---|---|---|---|
| Required Input | Type | Description | |
hypothesis | string | JSON-serialized list of retrieved chunks in ranked order | |
reference | string | JSON-serialized list of ground-truth relevant chunks |
| Output | ||
|---|---|---|
| Field | Description | |
| Result | Returns 1.0 if at least one relevant chunk was retrieved, 0.0 otherwise | |
| Reason | Short summary string of the score, e.g. Hit Rate: 1.0 |
Hit Rate does not take a
k parameter. It checks the entire retrieved list for any match.Batch evaluation
To evaluate multiple queries in a single call, pass a list of JSON-serialized inputs. Each element represents one retrieval evaluation:Python
How it works
Hit Rate is the simplest retrieval metric: did the retriever find at least one relevant chunk? Formula:What to do when Hit Rate is Low
If hit rate is low, the retriever is completely failing to find relevant content for some queries:- Check if the failing queries use different vocabulary or phrasing than what appears in the indexed documents
- Verify that the relevant documents are actually indexed and not filtered out during preprocessing
- For domain-specific queries, consider fine-tuning the embedding model or adding synonyms to the index
- Ensure document chunking doesn’t split relevant information into fragments too small to match
- Try hybrid retrieval (dense + sparse) to catch queries where one method fails
Differentiating Hit Rate with Similar Evals
- Recall@K: Hit Rate only checks if any relevant chunk was found, while Recall@K measures the fraction of all relevant chunks that were retrieved.
- MRR: Hit Rate is binary (hit or miss), while MRR additionally measures how high the first relevant chunk ranks.
- Precision@K: Hit Rate checks for the existence of any relevant result, while Precision@K measures what fraction of all retrieved results are relevant.