JSON Schema Validation — Deep Dive
JSON Schema is an IETF standard (RFC drafts) for describing the structure and constraints of JSON data. Python’s jsonschema library is the most complete implementation, supporting all draft versions and providing extensibility hooks for custom validation. This deep dive covers the specification internals, performance tuning, and production patterns.
1) Schema structure and keywords
A JSON Schema document is itself a JSON object. The top-level $schema keyword declares which draft version to use:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 1, "maxLength": 200},
"email": {"type": "string", "format": "email"},
"age": {"type": "integer", "minimum": 0, "maximum": 150},
"tags": {
"type": "array",
"items": {"type": "string", "maxLength": 50},
"minItems": 0,
"maxItems": 20,
"uniqueItems": true
},
"address": {"$ref": "#/$defs/address"}
},
"required": ["name", "email"],
"additionalProperties": false,
"$defs": {
"address": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"country": {"type": "string", "pattern": "^[A-Z]{2}$"}
},
"required": ["city", "country"]
}
}
}
Key keywords by category:
Structural: type, properties, items, prefixItems (Draft 2020-12), additionalProperties, patternProperties.
Numeric: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf.
String: minLength, maxLength, pattern, format.
Array: minItems, maxItems, uniqueItems, contains, minContains, maxContains.
Composition: allOf, anyOf, oneOf, not, if/then/else.
References: $ref, $defs, $dynamicRef, $anchor.
2) Python jsonschema library usage
import jsonschema
from jsonschema import Draft202012Validator, ValidationError
import json
# Load schema
with open("schema.json") as f:
schema = json.load(f)
# Create a reusable validator instance
validator = Draft202012Validator(schema)
# Validate and raise on first error
validator.validate(data)
# Collect all errors
errors = list(validator.iter_errors(data))
for error in sorted(errors, key=lambda e: list(e.path)):
path = ".".join(str(p) for p in error.absolute_path)
print(f"{path}: {error.message}")
The validator instance pre-compiles the schema, making it efficient for repeated validation. Always reuse validator instances in hot paths.
3) Format validation
JSON Schema’s format keyword (e.g., "format": "email", "format": "date-time", "format": "uri") is advisory by default — the spec says validators are not required to enforce it. In jsonschema, you must opt in to format checking:
from jsonschema import Draft202012Validator, FormatChecker
validator = Draft202012Validator(schema, format_checker=FormatChecker())
# FormatChecker validates: date, time, date-time, email, hostname,
# ipv4, ipv6, uri, uri-reference, iri, regex, and more.
You can register custom format checkers:
from jsonschema import FormatChecker
checker = FormatChecker()
@checker.checks("phone-number", raises=ValueError)
def check_phone(value):
import re
if not re.match(r"^\+?[\d\s\-()]{7,20}$", value):
raise ValueError(f"Invalid phone number: {value}")
return True
validator = Draft202012Validator(schema, format_checker=checker)
4) Custom validators and extending the spec
jsonschema allows extending validators with custom keywords:
from jsonschema import Draft202012Validator, validators
def is_positive_if_required(validator_instance, is_positive, instance, schema):
"""Custom keyword: isPositive."""
if is_positive and isinstance(instance, (int, float)) and instance <= 0:
yield ValidationError(f"{instance} is not positive")
CustomValidator = validators.extend(
Draft202012Validator,
{"isPositive": is_positive_if_required},
)
schema = {
"type": "object",
"properties": {
"price": {"type": "number", "isPositive": True},
},
}
v = CustomValidator(schema)
v.validate({"price": -5}) # Raises ValidationError
5) Referencing and schema composition
For large projects, schemas are split across multiple files:
from referencing import Registry, Resource
import json
# Load referenced schemas
with open("schemas/address.json") as f:
address_schema = json.load(f)
with open("schemas/user.json") as f:
user_schema = json.load(f)
# Build a registry
registry = Registry().with_resources([
("https://example.com/schemas/address.json",
Resource.from_contents(address_schema)),
])
# Validator resolves $ref against the registry
validator = Draft202012Validator(user_schema, registry=registry)
The referencing library (split from jsonschema in v4.18+) handles URI resolution, caching, and circular reference detection.
6) Performance characteristics
jsonschema is pure Python. Benchmark data for validating a 10-field object:
| Scenario | Throughput |
|---|---|
| Simple flat object, valid | ~15,000 validations/sec |
| With format checking | ~8,000 validations/sec |
| Nested 3 levels | ~5,000 validations/sec |
| Invalid data, collecting all errors | ~10,000 validations/sec |
For higher performance, consider fastjsonschema, which compiles JSON Schema into Python code:
import fastjsonschema
validate = fastjsonschema.compile(schema)
# Generated function — 5-10x faster than jsonschema
validate(data)
fastjsonschema supports Draft 4, 6, and 7. It does not support Draft 2020-12 or custom keywords, so there is a features-vs-speed tradeoff.
7) Generating JSON Schema from Python
From Pydantic:
from pydantic import BaseModel
class User(BaseModel):
name: str
email: str
age: int | None = None
schema = User.model_json_schema()
# Produces a valid Draft 2020-12 JSON Schema
From dataclasses (via third-party):
from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass_json
@dataclass
class User:
name: str
email: str
From attrs + cattrs:
cattrs does not generate JSON Schema natively, but you can use attrs field metadata to build schemas programmatically.
8) Production patterns
API request validation middleware (FastAPI): FastAPI uses Pydantic, which uses JSON Schema internally. But for non-Pydantic APIs or custom schemas:
from starlette.middleware.base import BaseHTTPMiddleware
from jsonschema import Draft202012Validator, ValidationError
class SchemaValidationMiddleware(BaseHTTPMiddleware):
def __init__(self, app, schemas: dict):
super().__init__(app)
self.validators = {
path: Draft202012Validator(schema)
for path, schema in schemas.items()
}
async def dispatch(self, request, call_next):
validator = self.validators.get(request.url.path)
if validator and request.method in ("POST", "PUT", "PATCH"):
body = await request.json()
errors = list(validator.iter_errors(body))
if errors:
return JSONResponse(
{"errors": [{"path": list(e.path), "message": e.message} for e in errors]},
status_code=422,
)
return await call_next(request)
Config file validation:
import tomllib
import json
with open("config-schema.json") as f:
config_schema = json.load(f)
with open("config.toml", "rb") as f:
config = tomllib.load(f)
validator = Draft202012Validator(config_schema)
errors = list(validator.iter_errors(config))
if errors:
for e in errors:
print(f"Config error at {'.'.join(str(p) for p in e.path)}: {e.message}")
raise SystemExit(1)
Cross-language data contracts: Publish JSON Schema files alongside your API documentation. Consumers in any language validate against the same schema. This is the strongest use case for JSON Schema over Python-only validation — it provides a single source of truth that JavaScript frontends, Go services, and Python backends all enforce identically.
One thing to remember: JSON Schema is the lingua franca of data validation — a language-agnostic standard that Python’s jsonschema library enforces faithfully, making it the right tool when your validation rules must be shared across teams, languages, and systems.
See Also
- Python Airflow Anti Patterns How Airflow Anti Patterns helps Python teams reduce surprises and keep systems predictable.
- Python Airflow Automation Playbook How Airflow Automation Playbook helps Python teams reduce surprises and keep systems predictable.
- Python Airflow Best Practices How Airflow Best Practices helps Python teams reduce surprises and keep systems predictable.
- Python Airflow Caching Patterns How Airflow Caching Patterns helps Python teams reduce surprises and keep systems predictable.
- Python Airflow Configuration Management How Airflow Configuration Management helps Python teams reduce surprises and keep systems predictable.