SaltStack Configuration with Python — Deep Dive

Build custom Salt execution modules, event reactors, and orchestration runners for Python-native infrastructure automation

Salt’s communication internals

Salt uses ZeroMQ for its transport layer (with TCP and WebSocket alternatives available). The master runs two ZeroMQ channels:

Publisher (port 4505) — broadcasts commands to all minions using PUB/SUB
Return (port 4506) — receives results from minions using REQ/REP

When you run salt '*' test.ping, the master publishes a message on port 4505. Every minion receives it, executes the function, and sends the result back on port 4506. The master collects responses and displays them.

This publish-subscribe model is why Salt scales better than SSH-based tools. Adding more minions doesn’t proportionally increase master load — the broadcast reaches all subscribers simultaneously.

Writing execution modules

Execution modules are the fundamental building block. Every salt command maps to a Python function:

# /srv/salt/_modules/deployment.py
"""Custom deployment module for Python applications."""

import os
import subprocess
import json
import logging

log = logging.getLogger(__name__)

__virtualname__ = "deploy"

def __virtual__():
    """Only load on Linux systems with systemd."""
    if __grains__["kernel"] == "Linux" and os.path.exists("/usr/bin/systemctl"):
        return __virtualname__
    return False

def release(app_name, version, repo_url, venv_path="/opt/venvs"):
    """
    Deploy a new version of a Python application.

    CLI Example:
        salt 'web*' deploy.release myapp v2.3.1 https://github.com/org/myapp
    """
    app_dir = f"/opt/apps/{app_name}"
    venv = f"{venv_path}/{app_name}"

    steps = []

    # Clone or update repository
    if os.path.exists(app_dir):
        result = _run(f"git -C {app_dir} fetch && git -C {app_dir} checkout {version}")
    else:
        result = _run(f"git clone {repo_url} {app_dir} && git -C {app_dir} checkout {version}")
    steps.append({"git": result})

    # Update virtual environment
    result = _run(f"{venv}/bin/pip install -r {app_dir}/requirements.txt --quiet")
    steps.append({"pip": result})

    # Run migrations
    result = _run(
        f"{venv}/bin/python {app_dir}/manage.py migrate --noinput",
        env={"DJANGO_SETTINGS_MODULE": f"{app_name}.settings.production"},
    )
    steps.append({"migrate": result})

    # Restart service
    __salt__["service.restart"](app_name)
    steps.append({"restart": "completed"})

    # Health check
    health = __salt__["http.query"](
        f"http://localhost:8000/health",
        status=True,
    )
    steps.append({"health": health.get("status", "unknown")})

    return {
        "app": app_name,
        "version": version,
        "steps": steps,
        "success": health.get("status") == 200,
    }

def _run(cmd, env=None):
    """Run a shell command and return structured output."""
    full_env = os.environ.copy()
    if env:
        full_env.update(env)
    try:
        result = subprocess.run(
            cmd, shell=True, capture_output=True, text=True,
            timeout=300, env=full_env,
        )
        return {
            "retcode": result.returncode,
            "stdout": result.stdout[-500:] if result.stdout else "",
            "stderr": result.stderr[-500:] if result.stderr else "",
        }
    except subprocess.TimeoutExpired:
        return {"retcode": -1, "error": "timeout"}

Key patterns:

__virtual__() conditionally loads the module based on the target system
__grains__ accesses system information (injected by Salt’s loader)
__salt__ calls other execution modules (Salt’s inter-module API)
Structured return values — always return dicts, not strings, so downstream tooling can parse results

Custom state modules

State modules define new declarative state types:

# /srv/salt/_states/python_app.py
"""State module for managing Python application deployments."""

def deployed(name, version, repo_url, health_endpoint="/health"):
    """
    Ensure a Python application is deployed at the specified version.

    name:
        Application name
    version:
        Git tag or commit to deploy
    repo_url:
        Git repository URL
    """
    ret = {"name": name, "changes": {}, "result": True, "comment": ""}

    # Check current version
    current = _get_current_version(name)

    if current == version:
        ret["comment"] = f"{name} already at version {version}"
        return ret

    if __opts__["test"]:
        ret["result"] = None
        ret["comment"] = f"Would update {name} from {current} to {version}"
        ret["changes"] = {"old": current, "new": version}
        return ret

    # Perform deployment using our execution module
    result = __salt__["deploy.release"](name, version, repo_url)

    if result["success"]:
        ret["changes"] = {"old": current, "new": version}
        ret["comment"] = f"Deployed {name} {version} successfully"
    else:
        ret["result"] = False
        ret["comment"] = f"Deployment failed: {result}"

    return ret

def _get_current_version(app_name):
    app_dir = f"/opt/apps/{app_name}"
    result = __salt__["cmd.run_stdout"](
        f"git -C {app_dir} describe --tags --always 2>/dev/null || echo 'none'"
    )
    return result.strip()

Used in state files:

# /srv/salt/apps/myapp.sls
myapp:
  python_app.deployed:
    - version: v2.3.1
    - repo_url: https://github.com/org/myapp
    - health_endpoint: /api/health

Event-driven automation with reactors

Salt’s event bus enables real-time responses to infrastructure changes:

# /etc/salt/master.d/reactor.conf
reactor:
  - "salt/minion/*/start":
    - /srv/reactor/minion_start.sls

  - "myapp/deployment/failed":
    - /srv/reactor/deployment_rollback.sls

  - "salt/beacon/*/diskusage/*":
    - /srv/reactor/disk_alert.sls

# /srv/reactor/deployment_rollback.sls
rollback_failed_deploy:
  runner.state.orchestrate:
    - args:
      - mods: orchestration.rollback
      - pillar:
          app_name: {{ data['app_name'] }}
          failed_version: {{ data['version'] }}

Custom beacons (Python event generators)

Beacons run on minions and fire events when conditions are met:

# /srv/salt/_beacons/app_latency.py
"""Beacon that monitors application response latency."""
import requests
import time
import logging

log = logging.getLogger(__name__)

def validate(config):
    if not isinstance(config, list) or not config:
        return False, "Configuration must be a non-empty list"
    return True, "Valid beacon configuration"

def beacon(config):
    """Check application latency and fire event if threshold exceeded."""
    ret = []

    for entry in config:
        url = entry.get("url", "http://localhost:8000/health")
        threshold_ms = entry.get("threshold_ms", 500)

        start = time.monotonic()
        try:
            resp = requests.get(url, timeout=5)
            latency_ms = (time.monotonic() - start) * 1000

            if latency_ms > threshold_ms:
                ret.append({
                    "url": url,
                    "latency_ms": round(latency_ms, 1),
                    "threshold_ms": threshold_ms,
                    "status_code": resp.status_code,
                    "tag": "high_latency",
                })
        except requests.RequestException as e:
            ret.append({
                "url": url,
                "error": str(e),
                "tag": "unreachable",
            })

    return ret

Orchestration runners

Runners execute on the master and coordinate multi-minion workflows:

# /srv/salt/_runners/canary_deploy.py
"""Canary deployment runner."""

def execute(app_name, version, canary_target="web-canary*", full_target="web*"):
    """
    Deploy to canary hosts first, verify, then roll out to all.

    CLI Example:
        salt-run canary_deploy.execute myapp v2.3.1
    """
    client = __salt__["salt.cmd"]

    # Step 1: Deploy to canary
    canary_result = __salt__["salt.execute"](
        canary_target, "deploy.release",
        kwarg={"app_name": app_name, "version": version},
    )

    canary_success = all(
        r.get("success", False) for r in canary_result.values()
    )

    if not canary_success:
        return {
            "stage": "canary",
            "success": False,
            "message": "Canary deployment failed, aborting",
            "details": canary_result,
        }

    # Step 2: Wait and monitor canary
    import time
    time.sleep(60)  # Wait for metrics to settle

    health_results = __salt__["salt.execute"](
        canary_target, "deploy.health_check",
        kwarg={"app_name": app_name},
    )

    canary_healthy = all(
        r.get("healthy", False) for r in health_results.values()
    )

    if not canary_healthy:
        # Rollback canary
        __salt__["salt.execute"](
            canary_target, "deploy.rollback",
            kwarg={"app_name": app_name},
        )
        return {
            "stage": "canary_verification",
            "success": False,
            "message": "Canary health check failed, rolled back",
        }

    # Step 3: Full rollout
    full_result = __salt__["salt.execute"](
        full_target, "deploy.release",
        kwarg={"app_name": app_name, "version": version},
    )

    return {
        "stage": "complete",
        "success": all(r.get("success", False) for r in full_result.values()),
        "canary_hosts": list(canary_result.keys()),
        "total_hosts": list(full_result.keys()),
        "version": version,
    }

Salt vs Ansible: technical comparison

Dimension	Salt	Ansible
Architecture	Master + agents (ZeroMQ)	Agentless (SSH)
Speed at scale	Sub-second for 10K hosts	Minutes for 1K+ hosts
Real-time events	Built-in event bus + reactors	Not native
Learning curve	Steeper (master setup, PKI)	Gentler (just SSH)
Agentless mode	salt-ssh (slower)	Default
State language	YAML + Jinja2	YAML + Jinja2
Python extensibility	Modules, states, beacons, runners, returners	Modules, plugins, filters
Community size	Smaller	Larger

Performance tuning

Worker threads: Increase worker_threads on the master for parallel job handling (default: 5)
Batch mode: salt --batch 20% processes hosts in waves to avoid overwhelming the master
Minion cache: Enable minion_data_cache: True to avoid repeated grain collection
Returner offloading: Send results to Redis or PostgreSQL via returner modules instead of processing everything on the master
Syndic architecture: For very large deployments (50K+ minions), use syndic masters as intermediaries

The one thing to remember: Salt’s Python-native architecture — from execution modules to event reactors to orchestration runners — makes it the most programmable configuration management tool available, especially for teams that think in Python.

pythonsaltstackconfigurationdevops