Python Dataclass Field Metadata — Deep Dive

Building production frameworks on top of dataclass field metadata: type-safe keys, namespace conventions, and real-world patterns from marshmallow and cattrs.

The metadata API in detail

The field() function accepts a metadata parameter that must be a mapping (dict, MappingProxy, or any Mapping subclass). Internally, CPython wraps it in types.MappingProxyType:

from dataclasses import dataclass, field, fields
import types

@dataclass
class Example:
    x: int = field(metadata={"key": "value"})

f = fields(Example)[0]
type(f.metadata)  # types.MappingProxyType
f.metadata["key"]  # "value"

If you pass None or omit metadata, the field gets an empty MappingProxyType({}). The immutability is important: multiple instances of the dataclass share the same field objects, so mutable metadata would create a shared-state bug.

What lives in a Field object

Beyond metadata, dataclasses.Field carries:

name: field name (str)
type: the annotation
default / default_factory: default value
repr, init, compare, hash, kw_only: behavioral flags
metadata: the MappingProxyType

All of these are set at class creation time by the @dataclass decorator and are immutable afterward.

Namespace conventions for metadata keys

When multiple libraries store metadata on the same fields, key collisions become a risk. The ecosystem has converged on a convention: use your library’s name or a unique prefix as a namespace.

String-key namespacing

@dataclass
class User:
    email: str = field(metadata={
        "marshmallow": {"validate": Email()},
        "myapp.db": {"column": "user_email", "indexed": True},
        "myapp.api": {"alias": "emailAddress"},
    })

Each library reads only its own namespace. This is similar to XML namespaces but simpler.

Sentinel-object keys

A more type-safe approach uses unique objects as keys to prevent string collisions entirely:

from dataclasses import dataclass, field

# Library defines a private sentinel
class _DBMeta:
    COLUMN = object()
    INDEXED = object()

db = _DBMeta

@dataclass
class User:
    email: str = field(metadata={
        db.COLUMN: "user_email",
        db.INDEXED: True,
    })

Libraries like cattrs and attrs use this pattern internally.

Combining metadata with typing.Annotated

Python 3.9+ introduced Annotated, which can carry metadata at the type level rather than at the field level:

from typing import Annotated
from dataclasses import dataclass

MaxLen = lambda n: {"max_length": n}

@dataclass
class Product:
    name: Annotated[str, MaxLen(100)]
    sku: Annotated[str, MaxLen(20)]

The distinction:

Annotated metadata lives on the type and is accessible via typing.get_type_hints(cls, include_extras=True)
field(metadata=...) lives on the field and is accessible via dataclasses.fields(cls)

Some libraries (Pydantic, beartype) read Annotated metadata. Others (marshmallow-dataclass) read field metadata. In practice, you may need both:

from typing import Annotated
from dataclasses import dataclass, field

@dataclass
class Event:
    # Type-level: "this is a positive int"
    # Field-level: "serialize as 'event_priority'"
    priority: Annotated[int, "positive"] = field(
        metadata={"alias": "event_priority"}
    )

Real-world: marshmallow-dataclass

The marshmallow-dataclass library reads field metadata to generate marshmallow schemas:

from dataclasses import dataclass, field
from marshmallow_dataclass import class_schema
from marshmallow import validate

@dataclass
class Article:
    title: str = field(metadata={
        "validate": validate.Length(min=1, max=200),
        "required": True,
    })
    word_count: int = field(metadata={
        "validate": validate.Range(min=0),
        "load_default": 0,
    })

ArticleSchema = class_schema(Article)
schema = ArticleSchema()
result = schema.load({"title": "Hello", "word_count": 500})
# result is an Article instance

The metadata keys map directly to marshmallow field constructor arguments. This is one of the cleanest integrations because the conventions are well-documented.

Real-world: cattrs and attrs

attrs (the library that inspired dataclasses) has its own metadata system, and cattrs reads it for structuring/unstructuring:

import attr
import cattrs

@attr.s(auto_attribs=True)
class Point:
    x: float = attr.ib(metadata={"unit": "meters"})
    y: float = attr.ib(metadata={"unit": "meters"})

# cattrs doesn't read metadata by default, but custom hooks can:
converter = cattrs.Converter()

def point_unstructure(p):
    result = {}
    for a in attr.fields(type(p)):
        key = a.metadata.get("json_key", a.name)
        result[key] = getattr(p, a.name)
    return result

converter.register_unstructure_hook(Point, point_unstructure)

Building a metadata-driven framework

Here’s a complete example: a mini ORM that creates SQL tables from dataclass metadata.

from dataclasses import dataclass, field, fields

SQL_TYPE_MAP = {int: "INTEGER", str: "TEXT", float: "REAL", bool: "INTEGER"}

@dataclass
class Column:
    table: str = ""
    primary_key: bool = False
    nullable: bool = True
    unique: bool = False

def col(**kwargs) -> dict:
    """Shorthand for creating column metadata."""
    return {"db": Column(**kwargs)}

@dataclass
class User:
    id: int = field(metadata=col(primary_key=True, nullable=False))
    username: str = field(metadata=col(unique=True, nullable=False))
    email: str = field(metadata=col(nullable=False))
    bio: str = field(default="", metadata=col(nullable=True))

def generate_create_table(cls, table_name: str) -> str:
    columns = []
    for f in fields(cls):
        col_meta = f.metadata.get("db", Column())
        sql_type = SQL_TYPE_MAP.get(f.type, "TEXT")
        parts = [f.name, sql_type]
        if col_meta.primary_key:
            parts.append("PRIMARY KEY")
        if not col_meta.nullable:
            parts.append("NOT NULL")
        if col_meta.unique:
            parts.append("UNIQUE")
        columns.append(" ".join(parts))
    cols_sql = ",\n  ".join(columns)
    return f"CREATE TABLE {table_name} (\n  {cols_sql}\n);"

print(generate_create_table(User, "users"))

Output:

CREATE TABLE users (
  id INTEGER PRIMARY KEY NOT NULL,
  username TEXT UNIQUE NOT NULL,
  email TEXT NOT NULL,
  bio TEXT
);

This pattern — metadata describing schema, a generic processor generating output — scales to REST API route generation, form building, CLI argument parsing, and more.

Performance characteristics

Metadata access via fields() is fast: dataclasses.fields() returns a cached tuple, and .metadata is a direct attribute access. The MappingProxy lookup is equivalent to a dict lookup. For most applications, the overhead is negligible.

However, if you’re processing millions of records and inspecting metadata per record, cache the field metadata outside the loop:

# Slow: re-fetches fields each iteration
for record in million_records:
    for f in fields(record):
        if f.metadata.get("indexed"):
            index(record, f)

# Fast: cache field info
indexed_fields = [f for f in fields(MyModel) if f.metadata.get("indexed")]
for record in million_records:
    for f in indexed_fields:
        index(record, f)

Limitations and trade-offs

No schema for metadata. Metadata is an untyped dict. Typos in keys ("max_lenght") fail silently. Consider defining constants or dataclasses for your metadata keys to catch errors at import time.
Immutable after creation. You can’t modify metadata after the class is defined. If you need dynamic metadata, maintain a separate registry keyed by (class, field_name).
Not inherited. If you subclass a dataclass and redefine a field, the parent’s metadata is replaced, not merged. You’d need a custom __init_subclass__ to merge metadata.
Invisible to IDE autocomplete. Since metadata is a plain dict, IDEs can’t autocomplete keys or validate values. Typed wrappers (like the Column dataclass above) partially address this.
No standard keys. Unlike Java annotations or C# attributes, Python has no standard metadata keys. Each library invents its own, leading to inconsistency. PEP 681 (dataclass transforms) helps type checkers but doesn’t standardize runtime metadata.

When NOT to use field metadata

Complex runtime behavior: Use descriptors or __post_init__ instead.
ORM column definitions: Use the ORM’s native column types (SQLAlchemy Column, Django Field). They have richer APIs.
Validation with dependencies: Pydantic’s model_validator or attrs validators handle cross-field validation that metadata-based approaches struggle with.

Field metadata is best for declarative, per-field annotations consumed by generic processors. It’s a building block, not a framework.

The one thing to remember: Field metadata is Python’s lightweight annotation system for dataclass fields — namespace your keys, write generic processors, and you get a declarative framework without external dependencies.

pythonstandard-librarytypes