Contains Code

Checks whether the output is valid code or contains expected code snippets.

result = evaluator.evaluate(
    eval_templates="contains_code",
    inputs={
        "output": "def fibonacci(n):\n    a, b = 0, 1\n    for _ in range(n):\n        print(a)\n        a, b = b, a + b"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "contains_code",
  {
    output: "def fibonacci(n):\n    a, b = 0, 1\n    for _ in range(n):\n        print(a)\n        a, b = b, a + b"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
outputstringThe model output to be checked for valid code content.
Output
FieldDescription
ResultReturns Passed if the output contains valid code, or Failed if it does not.
ReasonProvides a detailed explanation of the code detection assessment.

What to Do When Contains Code Score is Low

  • Ensure the code is properly formatted with appropriate indentation and syntax for its language
  • This evaluation can identify code across common programming languages like Python, JavaScript, Java, etc.
  • Mixed content (code with extensive natural language explanations) might yield uncertain results
  • Code snippets with syntax errors might still be identified as code, as the evaluation focuses on structural patterns

Comparing Contains Code with Similar Evals

  • Is JSON: Contains Code checks for any programming language code, while Is JSON specifically validates if content is proper JSON format.
  • Text to SQL: Contains Code detects presence of code generally, while Text to SQL evaluates the quality and correctness of SQL generation specifically.
Was this page helpful?

Questions & Discussion