Python YAML Processing — Core Concepts
YAML (YAML Ain’t Markup Language) is the configuration format of choice for DevOps tools. Python processes YAML through third-party libraries, with PyYAML being the most widely used.
Installation
pip install pyyaml
For advanced use cases (preserving comments, round-trip editing):
pip install ruamel.yaml
Reading YAML
Basic Loading
import yaml
with open("config.yml", encoding="utf-8") as f:
config = yaml.safe_load(f)
safe_load parses YAML into Python types:
| YAML | Python |
|---|---|
key: value | {"key": "value"} |
- item (list) | ["item"] |
42 | int |
3.14 | float |
true / yes | True |
null / ~ | None |
2026-03-28 | datetime.date |
From a String
data = yaml.safe_load("""
database:
host: localhost
port: 5432
name: myapp
""")
Multiple Documents
YAML files can contain multiple documents separated by ---:
with open("multi.yml", encoding="utf-8") as f:
docs = list(yaml.safe_load_all(f))
Writing YAML
import yaml
config = {
"database": {
"host": "localhost",
"port": 5432,
},
"features": ["auth", "logging", "caching"],
}
with open("output.yml", "w", encoding="utf-8") as f:
yaml.dump(config, f, default_flow_style=False, allow_unicode=True)
default_flow_style=False ensures block style (one item per line) instead of compact inline format.
Safety: safe_load vs. load
This is critical. Never use yaml.load() with untrusted input.
yaml.load() can execute arbitrary Python code through YAML tags:
# DANGEROUS YAML:
# !!python/os.system 'rm -rf /'
yaml.safe_load() only creates basic Python types (dicts, lists, strings, numbers). Always use it unless you explicitly need custom type construction with trusted input.
YAML Gotchas
The Norway Problem
YAML interprets certain strings as booleans:
# This YAML:
country: NO
# Becomes this Python:
{"country": False} # "NO" is treated as boolean False!
The same happens with yes, on, off, true, false, and their variations. Always quote strings that could be misinterpreted:
country: "NO"
Accidental Dates
# This YAML:
version: 1.10
# Becomes this Python:
{"version": 1.1} # Trailing zero lost!
And 2026-03-28 becomes a datetime.date object, not a string. Quote values when you need exact string preservation.
Indentation Sensitivity
YAML uses spaces only (never tabs). A single space difference changes meaning:
# This creates a dict inside a list:
items:
- name: Widget
price: 10
# This creates two separate list items:
items:
- name: Widget
- price: 10
ruamel.yaml for Round-Trip Editing
PyYAML loses comments when writing. ruamel.yaml preserves them:
from ruamel.yaml import YAML
yaml_rt = YAML()
yaml_rt.preserve_quotes = True
with open("config.yml") as f:
config = yaml_rt.load(f)
config["database"]["port"] = 5433
with open("config.yml", "w") as f:
yaml_rt.dump(config, f)
# Comments and formatting are preserved!
This is essential for tools that modify user configuration files — losing comments angers users.
YAML vs. Other Config Formats
| Feature | YAML | JSON | TOML |
|---|---|---|---|
| Comments | Yes | No | Yes |
| Human readability | Excellent | Good | Excellent |
| Complexity | High | Low | Medium |
| Surprising behaviors | Many | Few | Few |
| Python stdlib | No | Yes | Yes (3.11+) |
Common Misconception
“YAML is just JSON with better syntax.” YAML is a superset of JSON (valid JSON is valid YAML), but it adds significant complexity: anchors, aliases, tags, multi-line strings, multiple document support, and implicit type conversions. This complexity is why many projects are migrating to TOML for configuration.
One Thing to Remember
Always use safe_load for security, quote ambiguous strings to avoid the Norway problem, and reach for ruamel.yaml when you need to edit YAML files without losing comments.
See Also
- Python Fuzzy Matching Fuzzywuzzy Find out how Python's FuzzyWuzzy library matches messy, misspelled text — like a friend who understands you even when you mumble.
- Python Regex Lookahead Lookbehind Learn how Python regex can peek ahead or behind without grabbing text — like checking what's next in line without stepping forward.
- Python Regex Named Groups Learn how Python regex named groups let you label the pieces you capture — like putting name tags on your search results.
- Python Regex Patterns Discover how Python regex patterns work like a secret code for finding hidden text treasures in any document.
- Python Regular Expressions Learn how Python can find tricky text patterns fast, like spotting every phone number hidden in a messy page.