Python Hydra Configuration — Core Concepts

Structure application settings with Hydra's composable YAML configs, overrides, and experiment management.

Why this topic matters

As applications grow — particularly in machine learning and data engineering — configuration complexity explodes. You might need different database backends, model architectures, training parameters, and logging setups, all in various combinations. Hydra, developed by Meta (Facebook AI Research), provides hierarchical configuration that you compose from small files and override from the command line.

How it works

Hydra uses YAML files organized in a conf/ directory. A typical structure:

conf/
├── config.yaml          # Main config (defaults list)
├── db/
│   ├── postgres.yaml
│   └── mysql.yaml
├── model/
│   ├── small.yaml
│   └── large.yaml
└── logging/
    ├── console.yaml
    └── file.yaml

The main config.yaml declares which sub-configs to use:

defaults:
  - db: postgres
  - model: small
  - logging: console

app:
  name: my-experiment
  seed: 42

Your Python code accesses this as a structured object:

import hydra
from omegaconf import DictConfig

@hydra.main(config_path="conf", config_name="config", version_base=None)
def main(cfg: DictConfig):
    print(f"Database: {cfg.db.host}")
    print(f"Model layers: {cfg.model.num_layers}")

main()

Key concepts

Command-line overrides

Any value can be changed when running the program:

python train.py db=mysql model=large model.learning_rate=0.001

This replaces the entire db config group with mysql.yaml and overrides a specific field in the model config. No file editing needed.

Config groups

Each subdirectory under conf/ is a config group. Groups represent interchangeable options for a component. The db/ group has postgres and mysql options. You pick one per run.

This composability is Hydra’s core advantage. Ten database options times five model configs times three logging setups gives you 150 combinations — all from 18 small files instead of 150 large ones.

OmegaConf integration

Hydra uses OmegaConf under the hood, which provides:

Variable interpolation: ${db.host} references other config values
Typed access: cfg.model.num_layers raises an error if the key doesn’t exist
Read-only mode: Prevents accidental config mutation during runtime
Missing value markers: ??? indicates a required field that must be overridden

# model/large.yaml
num_layers: 12
hidden_size: 768
checkpoint_dir: ${app.output_dir}/checkpoints

Automatic output directories

Hydra creates a unique output directory for each run, stamped with date and time:

outputs/2026-03-28/14-30-22/
├── .hydra/
│   ├── config.yaml      # Resolved config for this run
│   ├── hydra.yaml
│   └── overrides.yaml   # What was changed
└── train.log

This makes experiment tracking reproducible without extra tooling. You can always see exactly which configuration produced which results.

Multirun and sweeps

Hydra can launch multiple runs across parameter combinations:

python train.py --multirun model.learning_rate=0.001,0.01,0.1 model=small,large

This creates six runs (3 learning rates × 2 models). Sweeper plugins extend this to grid search, random search, or Bayesian optimization via tools like Optuna.

Common misconception

“Hydra is just for ML projects.” While Hydra was born in ML research, it works for any application with complex configuration: web services with multiple deployment targets, data pipelines with swappable sources, or CLI tools with many options. The composable config pattern benefits any project that outgrows a single settings file.

One thing to remember

Hydra turns configuration from a monolithic file into composable building blocks that you select and override at runtime — making complex parameter management reproducible and command-line driven.

pythonhydraconfigurationyaml