JSON & Structured Output

14 metrics for validating JSON correctness, schema compliance, type checking, and structured output quality.

📝

TL;DR

5 JSON metrics (binary): is_json, contains_json, json_schema, json_validation, json_syntax
9 structured output metrics (continuous 0.0-1.0): schema compliance, types, fields, hierarchy
All run locally via from fi.evals import evaluate

Validate whether your LLM produces correct JSON and well-structured output.

from fi.evals import evaluate

result = evaluate("is_json", output='{"name": "Alice", "age": 30}')
print(result.score)   # 1.0
print(result.passed)  # True

JSON Metrics

Binary checks: the output either passes or fails.

Metric	What it checks
`is_json`	Entire output is valid JSON
`contains_json`	Output contains JSON somewhere within it
`json_schema`	Output matches a provided JSON schema
`json_validation`	Structural correctness and data integrity
`json_syntax`	Correct JSON syntax — quoting, brackets, commas, escaping

is_json

Entire output is valid JSON.

result = evaluate("is_json", output='{"status": "ok", "code": 200}')
# score → 1.0

result = evaluate("is_json", output="This is not JSON")
# score → 0.0

contains_json

Output contains JSON somewhere within it, even surrounded by text.

result = evaluate("contains_json", output='Here is the result: {"name": "Alice"} as expected.')
# score → 1.0

json_schema

Output matches a provided JSON schema. Pass the schema via config.

result = evaluate(
    "json_schema",
    output='{"name": "Alice", "age": 30}',
    config={"schema": {"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name", "age"]}},
)
# score → 1.0

json_validation

Validates JSON for structural correctness and data integrity.

result = evaluate("json_validation", output='{"users": [{"id": 1, "name": "Alice"}]}')
# score → 1.0

json_syntax

Checks for correct JSON syntax — quoting, brackets, commas, escaping.

result = evaluate("json_syntax", output='{"temp": 22.5, "valid": true}')
# score → 1.0

result = evaluate("json_syntax", output='{"valid": True}')
# score → 0.0 (True should be true)

Structured Output Metrics

Continuous scores (0.0-1.0) measuring how closely the output structure matches an expected structure. Pass both as JSON strings.

Metric	What it measures
`schema_compliance`	Conformance to expected schema (field names, nesting, types)
`type_compliance`	Whether field types match expected types
`field_completeness`	Proportion of expected fields that are present
`required_fields`	Whether all required fields are present
`field_coverage`	Percentage of expected fields with non-null, non-empty values
`hierarchy_score`	How well nested structure matches the expected hierarchy
`tree_edit_distance`	Normalized edit distance between tree structures
`structured_output_score`	Composite score combining schema, type, field, and hierarchy checks
`quick_structured_check`	Fast basic structure check for quick pass/fail

schema_compliance

How well the output conforms to the expected schema (field names, nesting, types).

result = evaluate(
    "schema_compliance",
    output='{"name": "Alice", "age": 30, "email": "alice@test.com"}',
    expected_output='{"name": "string", "age": 0, "email": "string"}',
)
# score → 1.0

type_compliance

Whether field types match expected types.

result = evaluate(
    "type_compliance",
    output='{"name": "Alice", "age": 30, "active": true}',
    expected_output='{"name": "Bob", "age": 25, "active": false}',
)
# score → 1.0 (all types match)

field_completeness

Proportion of expected fields that are present.

result = evaluate(
    "field_completeness",
    output='{"name": "Alice"}',
    expected_output='{"name": "string", "age": 0, "email": "string"}',
)
# score → ~0.33 (1 of 3 fields)

required_fields

Whether all required fields are present.

result = evaluate(
    "required_fields",
    output='{"id": 1, "name": "Alice", "email": "alice@test.com"}',
    expected_output='{"id": 0, "name": "string", "email": "string"}',
)
# score → 1.0

field_coverage

Percentage of expected fields with non-null, non-empty values.

result = evaluate(
    "field_coverage",
    output='{"name": "Alice", "age": null, "bio": ""}',
    expected_output='{"name": "string", "age": 0, "bio": "string"}',
)
# score → ~0.33 (only name has a value)

hierarchy_score

How well nested structure matches the expected hierarchy.

result = evaluate(
    "hierarchy_score",
    output='{"user": {"name": "Alice", "address": {"city": "NYC"}}}',
    expected_output='{"user": {"name": "string", "address": {"city": "string"}}}',
)
# score → 1.0

tree_edit_distance

Normalized edit distance between tree structures. Higher = more similar.

result = evaluate(
    "tree_edit_distance",
    output='{"a": 1, "b": {"c": 2}}',
    expected_output='{"a": 1, "b": {"c": 2}}',
)
# score → 1.0 (identical)

structured_output_score

Composite score combining schema compliance, type compliance, field completeness, and hierarchy.

result = evaluate(
    "structured_output_score",
    output='{"id": 1, "name": "Alice", "tags": ["python"], "meta": {"role": "engineer"}}',
    expected_output='{"id": 0, "name": "string", "tags": ["string"], "meta": {"role": "string"}}',
)
# score → ~0.95

quick_structured_check

Fast basic structure check. Use when you need a quick pass/fail.

result = evaluate(
    "quick_structured_check",
    output='{"name": "Alice", "age": 30}',
    expected_output='{"name": "string", "age": 0}',
)
# score → 1.0

String & Similarity

Keyword matching, regex, BLEU, ROUGE, embeddings.

Guardrails

Prompt injection, PII, secrets, SQL injection.

Running Evaluations

Full evaluate() API reference.

Was this page helpful?

FutureAGI AI Assistant