Python's Built-in tomllib Module — Deep Dive
Technical perspective
tomllib was added to CPython via PEP 680, based on the tomli package by Taneli Hukkinen. It’s a pure-Python, read-only TOML 1.0 parser with no dependencies. The decision to make it read-only was deliberate — writing TOML with comment and formatting preservation is a significantly harder problem that the stdlib team chose not to take on. Understanding the module’s capabilities, limitations, and integration patterns is essential for modern Python configuration management.
Parser internals
tomllib implements a single-pass parser that processes the TOML document character by character. Key implementation details:
- UTF-8 only — the parser expects bytes (hence binary mode) and decodes UTF-8 internally. This prevents encoding mismatches that plague text-mode readers.
- Strict TOML 1.0 compliance — rejects TOML 0.5 constructs that were deprecated (like mixed array types in some edge cases).
- No streaming — the entire document is loaded into memory. For configuration files (typically < 100KB), this is fine. For large data files, TOML isn’t the right format anyway.
# The load function signature
def load(fp: SupportsRead[bytes], /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]:
...
def loads(s: str, /, *, parse_float: Callable[[str], Any] = float) -> dict[str, Any]:
...
The parse_float parameter lets you control float parsing — useful for exact decimal handling:
import tomllib
from decimal import Decimal
with open("prices.toml", "rb") as f:
data = tomllib.load(f, parse_float=Decimal)
# Floats in TOML are now Decimal objects
print(type(data["price"])) # <class 'decimal.Decimal'>
This is critical for financial applications where floating-point rounding errors are unacceptable.
Schema validation with Pydantic
tomllib returns plain dicts. For production config, validate with Pydantic:
import tomllib
from pydantic import BaseModel, field_validator
from pathlib import Path
class DatabaseConfig(BaseModel):
host: str = "localhost"
port: int = 5432
name: str
pool_size: int = 10
@field_validator("port")
@classmethod
def port_in_range(cls, v: int) -> int:
if not (1 <= v <= 65535):
raise ValueError(f"port must be 1-65535, got {v}")
return v
class AppConfig(BaseModel):
debug: bool = False
log_level: str = "INFO"
database: DatabaseConfig
def load_config(path: str = "config.toml") -> AppConfig:
with open(path, "rb") as f:
raw = tomllib.load(f)
return AppConfig(**raw)
This gives you type coercion, validation, default values, and clear error messages — all from a TOML file.
Environment overlay pattern
Production applications need environment-specific overrides. A layered config approach:
import tomllib
import os
from pathlib import Path
from typing import Any
def deep_merge(base: dict, override: dict) -> dict:
"""Recursively merge override into base."""
result = base.copy()
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
result[key] = deep_merge(result[key], value)
else:
result[key] = value
return result
def load_layered_config() -> dict[str, Any]:
# Layer 1: defaults
with open("config/defaults.toml", "rb") as f:
config = tomllib.load(f)
# Layer 2: environment-specific
env = os.environ.get("APP_ENV", "development")
env_path = Path(f"config/{env}.toml")
if env_path.exists():
with open(env_path, "rb") as f:
config = deep_merge(config, tomllib.load(f))
# Layer 3: local overrides (git-ignored)
local_path = Path("config/local.toml")
if local_path.exists():
with open(local_path, "rb") as f:
config = deep_merge(config, tomllib.load(f))
return config
File structure:
config/
├── defaults.toml # Base configuration
├── development.toml # Dev overrides
├── staging.toml # Staging overrides
├── production.toml # Production overrides
└── local.toml # Personal overrides (git-ignored)
Plugin discovery via pyproject.toml
Tools that support plugin systems often use TOML for plugin configuration:
import tomllib
import importlib
from pathlib import Path
def discover_plugins(project_root: Path) -> list:
"""Load plugins declared in pyproject.toml."""
pyproject = project_root / "pyproject.toml"
with open(pyproject, "rb") as f:
data = tomllib.load(f)
plugin_specs = (
data.get("tool", {})
.get("my-framework", {})
.get("plugins", [])
)
plugins = []
for spec in plugin_specs:
module_path, _, attr = spec.rpartition(":")
module = importlib.import_module(module_path)
plugin_class = getattr(module, attr)
plugins.append(plugin_class())
return plugins
With a pyproject.toml like:
[tool.my-framework]
plugins = [
"myapp.plugins.auth:AuthPlugin",
"myapp.plugins.cache:CachePlugin",
]
This is the pattern used by pytest, Ruff, Black, and many other Python tools.
TOML-specific data types
TOML supports date and time types that JSON and most config formats don’t:
[schedule]
start_date = 2026-01-15
meeting_time = 14:30:00
deployment = 2026-03-28T09:00:00-05:00
config = tomllib.loads(toml_string)
print(type(config["schedule"]["start_date"])) # datetime.date
print(type(config["schedule"]["meeting_time"])) # datetime.time
print(type(config["schedule"]["deployment"])) # datetime.datetime
This eliminates the string-parsing step that JSON configs require for dates.
Migration from JSON/YAML configs
From JSON:
# Before
import json
with open("config.json") as f:
config = json.load(f)
# After
import tomllib
with open("config.toml", "rb") as f:
config = tomllib.load(f)
Main gains: comments in config, native dates, cleaner syntax for humans.
From YAML:
# Before
import yaml
with open("config.yaml") as f:
config = yaml.safe_load(f) # Must use safe_load!
# After
import tomllib
with open("config.toml", "rb") as f:
config = tomllib.load(f)
Main gains: no arbitrary code execution risk (YAML’s yaml.load is notoriously dangerous), no dependency (PyYAML isn’t in stdlib), no indentation-sensitive parsing errors.
Writing TOML
For the rare cases where you need to generate TOML:
# Option 1: tomli-w (minimal writer, matches tomllib style)
import tomli_w
config = {"database": {"host": "localhost", "port": 5432}}
with open("config.toml", "wb") as f:
tomli_w.dump(config, f)
# Option 2: tomlkit (preserves comments and formatting)
import tomlkit
doc = tomlkit.load(open("config.toml"))
doc["database"]["port"] = 5433
with open("config.toml", "w") as f:
tomlkit.dump(doc, f)
Use tomli-w for generating new files. Use tomlkit when you need to modify existing files while preserving comments and formatting — essential for tools that update pyproject.toml programmatically.
Performance benchmarks
For typical configuration files (1–10 KB):
| Parser | Time (μs) | Notes |
|---|---|---|
| tomllib | ~150 | stdlib, pure Python |
| tomli | ~150 | Same codebase as tomllib |
| toml (old) | ~300 | Slower, TOML 0.5 only |
| json.loads | ~50 | Simpler format, C extension |
| yaml.safe_load | ~800 | Complex parser, pure Python |
tomllib is 2x faster than the old toml package and 5x faster than YAML parsing. For config files loaded once at startup, these differences are irrelevant — but they matter for tools that parse many pyproject.toml files (like package managers scanning dependencies).
Error handling patterns
import tomllib
import sys
def load_config_safely(path: str) -> dict | None:
try:
with open(path, "rb") as f:
return tomllib.load(f)
except FileNotFoundError:
print(f"Config file not found: {path}", file=sys.stderr)
return None
except tomllib.TOMLDecodeError as e:
print(f"Invalid TOML in {path}: {e}", file=sys.stderr)
return None
except PermissionError:
print(f"Cannot read {path}: permission denied", file=sys.stderr)
return None
TOMLDecodeError messages include line and column numbers:
Invalid TOML in config.toml: Expected '=' after a key in a key/value pair (at line 5, column 1)
Security considerations
TOML is inherently safe to parse — unlike YAML, there’s no object instantiation, no code execution, and no billion-laughs attack vector. The format is designed to be data-only.
The main security concern is not in parsing but in what you do with the parsed data: never use config values directly in SQL queries, shell commands, or template rendering without sanitization.
The one thing to remember: tomllib provides a zero-dependency, type-aware, safe config parser that integrates naturally with Pydantic validation, environment overlays, and the Python packaging ecosystem — making TOML the clear winner for Python configuration files.
See Also
- Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
- Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
- Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
- Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
- Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.