Skip to main content
import json
from fi.evals import Evaluator

evaluator = Evaluator()

result = evaluator.evaluate(
    eval_templates="hit_rate",
    inputs={
        "hypothesis": json.dumps([
            "France is in Europe.",
            "Paris is the capital of France.",
            "Napoleon was born in Corsica."
        ]),
        "reference": json.dumps([
            "Paris is the capital of France.",
            "The Eiffel Tower was built in 1889."
        ])
    }
)

print(result.eval_results[0].output)   # 1.0
print(result.eval_results[0].reason)
In this example, “Paris is the capital of France.” appears in both the retrieved and ground-truth lists, so at least one relevant chunk was found: hit rate = 1.0.
Input
Required InputTypeDescription
hypothesisstringJSON-serialized list of retrieved chunks in ranked order
referencestringJSON-serialized list of ground-truth relevant chunks
Output
FieldDescription
ResultReturns 1.0 if at least one relevant chunk was retrieved, 0.0 otherwise
ReasonShort summary string of the score, e.g. Hit Rate: 1.0
Hit Rate does not take a k parameter. It checks the entire retrieved list for any match.

Batch evaluation

To evaluate multiple queries in a single call, pass a list of JSON-serialized inputs. Each element represents one retrieval evaluation:
Python
results = evaluator.evaluate(
    eval_templates="hit_rate",
    inputs={
        "hypothesis": [
            json.dumps(["Paris is the capital of France.", "France is in Europe.", "Napoleon was born in Corsica."]),
            json.dumps(["The sky is blue.", "Water is wet."]),
            json.dumps(["Completely unrelated.", "Nothing matches."]),
        ],
        "reference": [
            json.dumps(["Paris is the capital of France.", "The Eiffel Tower was built in 1889."]),
            json.dumps(["The sky is blue.", "Water is wet."]),
            json.dumps(["The Louvre is in Paris."]),
        ],
    },
)

for i, r in enumerate(results.eval_results):
    print(f"Query {i+1}: {r.output}")
# Query 1: 1.0   (match found)
# Query 2: 1.0   (match found)
# Query 3: 0.0   (no match)

How it works

Hit Rate is the simplest retrieval metric: did the retriever find at least one relevant chunk? Formula:
Hit Rate = 1.0  if any retrieved chunk matches a ground-truth chunk
         = 0.0  otherwise
Matching is based on exact string equality. Hit Rate is useful as a baseline sanity check. If hit rate is 0.0, the retriever completely failed to find any relevant context, and all downstream metrics (Recall, Precision, NDCG) will also be 0.

What to do when Hit Rate is Low

If hit rate is low, the retriever is completely failing to find relevant content for some queries:
  • Check if the failing queries use different vocabulary or phrasing than what appears in the indexed documents
  • Verify that the relevant documents are actually indexed and not filtered out during preprocessing
  • For domain-specific queries, consider fine-tuning the embedding model or adding synonyms to the index
  • Ensure document chunking doesn’t split relevant information into fragments too small to match
  • Try hybrid retrieval (dense + sparse) to catch queries where one method fails

Differentiating Hit Rate with Similar Evals

  • Recall@K: Hit Rate only checks if any relevant chunk was found, while Recall@K measures the fraction of all relevant chunks that were retrieved.
  • MRR: Hit Rate is binary (hit or miss), while MRR additionally measures how high the first relevant chunk ranks.
  • Precision@K: Hit Rate checks for the existence of any relevant result, while Precision@K measures what fraction of all retrieved results are relevant.