JSON Schema Validation — Deep Dive

JSON Schema is an IETF standard (RFC drafts) for describing the structure and constraints of JSON data. Python’s jsonschema library is the most complete implementation, supporting all draft versions and providing extensibility hooks for custom validation. This deep dive covers the specification internals, performance tuning, and production patterns.

1) Schema structure and keywords

A JSON Schema document is itself a JSON object. The top-level $schema keyword declares which draft version to use:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "name": {"type": "string", "minLength": 1, "maxLength": 200},
        "email": {"type": "string", "format": "email"},
        "age": {"type": "integer", "minimum": 0, "maximum": 150},
        "tags": {
            "type": "array",
            "items": {"type": "string", "maxLength": 50},
            "minItems": 0,
            "maxItems": 20,
            "uniqueItems": true
        },
        "address": {"$ref": "#/$defs/address"}
    },
    "required": ["name", "email"],
    "additionalProperties": false,
    "$defs": {
        "address": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "country": {"type": "string", "pattern": "^[A-Z]{2}$"}
            },
            "required": ["city", "country"]
        }
    }
}

Key keywords by category:

Structural: type, properties, items, prefixItems (Draft 2020-12), additionalProperties, patternProperties.

Numeric: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf.

String: minLength, maxLength, pattern, format.

Array: minItems, maxItems, uniqueItems, contains, minContains, maxContains.

Composition: allOf, anyOf, oneOf, not, if/then/else.

References: $ref, $defs, $dynamicRef, $anchor.

2) Python jsonschema library usage

import jsonschema
from jsonschema import Draft202012Validator, ValidationError
import json

# Load schema
with open("schema.json") as f:
    schema = json.load(f)

# Create a reusable validator instance
validator = Draft202012Validator(schema)

# Validate and raise on first error
validator.validate(data)

# Collect all errors
errors = list(validator.iter_errors(data))
for error in sorted(errors, key=lambda e: list(e.path)):
    path = ".".join(str(p) for p in error.absolute_path)
    print(f"{path}: {error.message}")

The validator instance pre-compiles the schema, making it efficient for repeated validation. Always reuse validator instances in hot paths.

3) Format validation

JSON Schema’s format keyword (e.g., "format": "email", "format": "date-time", "format": "uri") is advisory by default — the spec says validators are not required to enforce it. In jsonschema, you must opt in to format checking:

from jsonschema import Draft202012Validator, FormatChecker

validator = Draft202012Validator(schema, format_checker=FormatChecker())

# FormatChecker validates: date, time, date-time, email, hostname,
# ipv4, ipv6, uri, uri-reference, iri, regex, and more.

You can register custom format checkers:

from jsonschema import FormatChecker

checker = FormatChecker()

@checker.checks("phone-number", raises=ValueError)
def check_phone(value):
    import re
    if not re.match(r"^\+?[\d\s\-()]{7,20}$", value):
        raise ValueError(f"Invalid phone number: {value}")
    return True

validator = Draft202012Validator(schema, format_checker=checker)

4) Custom validators and extending the spec

jsonschema allows extending validators with custom keywords:

from jsonschema import Draft202012Validator, validators

def is_positive_if_required(validator_instance, is_positive, instance, schema):
    """Custom keyword: isPositive."""
    if is_positive and isinstance(instance, (int, float)) and instance <= 0:
        yield ValidationError(f"{instance} is not positive")

CustomValidator = validators.extend(
    Draft202012Validator,
    {"isPositive": is_positive_if_required},
)

schema = {
    "type": "object",
    "properties": {
        "price": {"type": "number", "isPositive": True},
    },
}

v = CustomValidator(schema)
v.validate({"price": -5})  # Raises ValidationError

5) Referencing and schema composition

For large projects, schemas are split across multiple files:

from referencing import Registry, Resource
import json

# Load referenced schemas
with open("schemas/address.json") as f:
    address_schema = json.load(f)

with open("schemas/user.json") as f:
    user_schema = json.load(f)

# Build a registry
registry = Registry().with_resources([
    ("https://example.com/schemas/address.json",
     Resource.from_contents(address_schema)),
])

# Validator resolves $ref against the registry
validator = Draft202012Validator(user_schema, registry=registry)

The referencing library (split from jsonschema in v4.18+) handles URI resolution, caching, and circular reference detection.

6) Performance characteristics

jsonschema is pure Python. Benchmark data for validating a 10-field object:

ScenarioThroughput
Simple flat object, valid~15,000 validations/sec
With format checking~8,000 validations/sec
Nested 3 levels~5,000 validations/sec
Invalid data, collecting all errors~10,000 validations/sec

For higher performance, consider fastjsonschema, which compiles JSON Schema into Python code:

import fastjsonschema

validate = fastjsonschema.compile(schema)
# Generated function — 5-10x faster than jsonschema
validate(data)

fastjsonschema supports Draft 4, 6, and 7. It does not support Draft 2020-12 or custom keywords, so there is a features-vs-speed tradeoff.

7) Generating JSON Schema from Python

From Pydantic:

from pydantic import BaseModel

class User(BaseModel):
    name: str
    email: str
    age: int | None = None

schema = User.model_json_schema()
# Produces a valid Draft 2020-12 JSON Schema

From dataclasses (via third-party):

from dataclasses import dataclass
from dataclasses_json import dataclass_json

@dataclass_json
@dataclass
class User:
    name: str
    email: str

From attrs + cattrs: cattrs does not generate JSON Schema natively, but you can use attrs field metadata to build schemas programmatically.

8) Production patterns

API request validation middleware (FastAPI): FastAPI uses Pydantic, which uses JSON Schema internally. But for non-Pydantic APIs or custom schemas:

from starlette.middleware.base import BaseHTTPMiddleware
from jsonschema import Draft202012Validator, ValidationError

class SchemaValidationMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, schemas: dict):
        super().__init__(app)
        self.validators = {
            path: Draft202012Validator(schema)
            for path, schema in schemas.items()
        }

    async def dispatch(self, request, call_next):
        validator = self.validators.get(request.url.path)
        if validator and request.method in ("POST", "PUT", "PATCH"):
            body = await request.json()
            errors = list(validator.iter_errors(body))
            if errors:
                return JSONResponse(
                    {"errors": [{"path": list(e.path), "message": e.message} for e in errors]},
                    status_code=422,
                )
        return await call_next(request)

Config file validation:

import tomllib
import json

with open("config-schema.json") as f:
    config_schema = json.load(f)

with open("config.toml", "rb") as f:
    config = tomllib.load(f)

validator = Draft202012Validator(config_schema)
errors = list(validator.iter_errors(config))
if errors:
    for e in errors:
        print(f"Config error at {'.'.join(str(p) for p in e.path)}: {e.message}")
    raise SystemExit(1)

Cross-language data contracts: Publish JSON Schema files alongside your API documentation. Consumers in any language validate against the same schema. This is the strongest use case for JSON Schema over Python-only validation — it provides a single source of truth that JavaScript frontends, Go services, and Python backends all enforce identically.

One thing to remember: JSON Schema is the lingua franca of data validation — a language-agnostic standard that Python’s jsonschema library enforces faithfully, making it the right tool when your validation rules must be shared across teams, languages, and systems.

pythonjson-schemavalidationjson

See Also