Evaluation Using Interface
Input:- Required Inputs:
- input: The source material or reference text.
- output: The generated text to evaluate for hallucinations.
- Result: Returns ‘Passed’ if no hallucination is detected, ‘Failed’ if hallucination is detected.
- Reason: A detailed explanation of why the output was classified as containing or not containing hallucinations.
Evaluation Using SDK
Click here to learn how to setup evaluation using SDK.Input:
- Required Inputs:
- input:
string
- The source material or reference text. - output:
string
- The generated text to evaluate for hallucinations.
- input:
- Result: Returns a list containing ‘Passed’ if no hallucination is detected, or ‘Failed’ if hallucination is detected.
- Reason: Provides a detailed explanation of the evaluation.
What to do If you get Undesired Results
If the content is evaluated as containing hallucinations (Failed) and you want to improve it:- Ensure all claims in your output are explicitly supported by the source material
- Avoid extrapolating or generalizing beyond what is stated in the input
- Remove any specific details that aren’t mentioned in the source text
- Use qualifying language (like “may,” “could,” or “suggests”) when necessary
- Stick to paraphrasing rather than adding new information
- Double-check numerical values, dates, and proper nouns against the source
- Consider directly quoting from the source for critical information
Comparing Detect Hallucination with Similar Evals
- Factual Accuracy: While Detect Hallucination checks for fabricated information not in the source, Factual Accuracy evaluates the overall factual correctness of content against broader knowledge.
- Groundedness: Detect Hallucination focuses on absence of fabricated content, while Groundedness measures how well the output is supported by the source material.
- Context Adherence: Detect Hallucination identifies made-up information, while Context Adherence evaluates how well the output adheres to the given context.