Contains Code: Code Presence Validation Metric

Checks whether output is valid code or contains expected code snippets, validating code structure and syntax for LLM-generated programming output.

result = evaluator.evaluate(
    eval_templates="contains_code",
    inputs={
        "output": "def fibonacci(n):\n    a, b = 0, 1\n    for _ in range(n):\n        print(a)\n        a, b = b, a + b"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "contains_code",
  {
    output: "def fibonacci(n):\n    a, b = 0, 1\n    for _ in range(n):\n        print(a)\n        a, b = b, a + b"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);

Input
	Required Input	Type	Description
	`output`	`string`	The model output to be checked for valid code content.

Output
	Field	Description
	Result	Returns Passed if the output contains valid code, or Failed if it does not.
	Reason	Provides a detailed explanation of the code detection assessment.

What to Do When Contains Code Score is Low

Ensure the code is properly formatted with appropriate indentation and syntax for its language
This evaluation can identify code across common programming languages like Python, JavaScript, Java, etc.
Mixed content (code with extensive natural language explanations) might yield uncertain results
Code snippets with syntax errors might still be identified as code, as the evaluation focuses on structural patterns

Comparing Contains Code with Similar Evals

Is JSON: Contains Code checks for any programming language code, while Is JSON specifically validates if content is proper JSON format.
Text to SQL: Contains Code detects presence of code generally, while Text to SQL evaluates the quality and correctness of SQL generation specifically.

Was this page helpful?

Questions & Discussion