Model Versioning in Python — Deep Dive

Implement production-grade model versioning with DVC pipelines, MLflow Model Registry APIs, and automated promotion gates in CI/CD.

The Versioning Stack

Production ML teams rarely use a single tool. A robust versioning strategy layers three systems:

Git — versions code and configuration
Artifact store — versions large binaries (model weights, datasets)
Model registry — manages lifecycle stages and deployment metadata

Each layer solves a different problem, and skipping one creates gaps.

DVC for Data and Model Artifacts

Setup

pip install dvc dvc-s3
cd my-ml-project
dvc init
dvc remote add -d myremote s3://my-bucket/dvc-store

Tracking a Model File

dvc add models/classifier.pkl
git add models/classifier.pkl.dvc models/.gitignore
git commit -m "v1: baseline logistic regression"
git tag model-v1
dvc push

The .dvc file stores the MD5 hash of the model binary. Anyone with repo access can run dvc pull to get the exact artifact for any commit.

DVC Pipelines for Reproducibility

# dvc.yaml
stages:
  preprocess:
    cmd: python src/preprocess.py
    deps:
      - src/preprocess.py
      - data/raw/
    outs:
      - data/processed/

  train:
    cmd: python src/train.py
    deps:
      - src/train.py
      - data/processed/
    params:
      - train.learning_rate
      - train.epochs
    outs:
      - models/classifier.pkl
    metrics:
      - metrics.json:
          cache: false

Running dvc repro executes only stages whose dependencies changed. Combined with git tag, this gives you commit-level reproducibility of the entire pipeline.

Comparing Versions

dvc metrics diff model-v1 model-v2

This shows metric deltas between tagged versions without loading the models.

MLflow Model Registry

Registering a Model Programmatically

import mlflow
from mlflow.tracking import MlflowClient

# During training
with mlflow.start_run() as run:
    mlflow.log_params({"lr": 0.01, "epochs": 50})
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_metrics({"accuracy": 0.94, "f1": 0.91})

# Register the model
result = mlflow.register_model(
    model_uri=f"runs:/{run.info.run_id}/model",
    name="fraud-detector"
)
print(f"Registered version: {result.version}")

Promoting Between Stages

client = MlflowClient()

# Move version 3 to production
client.transition_model_version_stage(
    name="fraud-detector",
    version=3,
    stage="Production",
    archive_existing_versions=True  # auto-archive current prod
)

The archive_existing_versions flag ensures only one version serves production at a time — critical for avoiding split-traffic bugs.

Loading the Production Model

import mlflow.pyfunc

model = mlflow.pyfunc.load_model("models:/fraud-detector/Production")
predictions = model.predict(new_data)

This URI scheme (models:/<name>/<stage>) decouples application code from version numbers. The serving layer always loads whatever is currently tagged as Production.

Automated Promotion Gates

Manual promotion is error-prone. Production teams gate promotions with automated checks:

def should_promote(candidate_run_id: str, model_name: str) -> bool:
    client = MlflowClient()
    candidate = client.get_run(candidate_run_id)
    candidate_f1 = float(candidate.data.metrics["f1"])

    # Get current production metrics
    prod_versions = client.get_latest_versions(model_name, stages=["Production"])
    if not prod_versions:
        return True  # No production model yet

    prod_run = client.get_run(prod_versions[0].run_id)
    prod_f1 = float(prod_run.data.metrics["f1"])

    # Require minimum improvement
    return candidate_f1 > prod_f1 + 0.005

This pattern integrates into CI/CD: after training completes, the pipeline calls should_promote() and either transitions the stage or opens a review request.

Versioning Strategies Compared

Strategy	Traceability	Large File Support	Lifecycle Management	Complexity
Git tags + DVC	Full (code + data)	Yes (remote storage)	Manual	Medium
MLflow Registry	Run-level lineage	Yes (artifact store)	Built-in stages	Low
W&B Artifacts	Run-level lineage	Yes (deduped cloud)	Manual or API	Low
Custom S3 + metadata DB	Depends on implementation	Yes	Custom	High

Storage and Cleanup

Model artifacts accumulate fast. A team training daily can generate hundreds of gigabytes per month. Strategies:

Retention policies — delete artifacts older than 90 days that never reached staging
Deduplication — DVC deduplicates by content hash; identical models share storage
Tiered storage — move archived models to cheaper storage classes (S3 Glacier, GCS Coldline)

# Cleanup script: delete unregistered runs older than 90 days
from datetime import datetime, timedelta
import mlflow

client = mlflow.MlflowClient()
cutoff = datetime.now() - timedelta(days=90)

for run in client.search_runs(experiment_ids=["1"]):
    run_date = datetime.fromtimestamp(run.info.start_time / 1000)
    if run_date < cutoff:
        # Check if any registered model points to this run
        versions = client.search_model_versions(f"run_id='{run.info.run_id}'")
        if not versions:
            client.delete_run(run.info.run_id)

Tradeoffs

DVC excels when data versioning matters as much as model versioning and the team already uses Git. The downside is operational overhead — teams must remember to dvc push/pull alongside Git operations.

MLflow Registry provides the simplest path to lifecycle management but does not version training data by default. Combining it with DVC or a data catalog covers the gap.

Custom solutions offer maximum flexibility but require maintaining schema migrations, access control, and cleanup logic that managed tools handle automatically.

Real-World Pattern: Immutable Model Packages

Companies like Uber (Michelangelo) and Google (TFX) package models as immutable bundles containing weights, preprocessing code, serving configuration, and a schema contract. Each bundle gets a unique hash and version number. This eliminates “works on my machine” problems because the bundle is self-contained and reproducible.

One thing to remember: Effective model versioning links every trained artifact to its exact code, data, parameters, and metrics — making any historical model reproducible and any production model safely replaceable.

pythonmodel-versioningmlopsmachine-learning