Model Versioning in Python — Deep Dive

The Versioning Stack

Production ML teams rarely use a single tool. A robust versioning strategy layers three systems:

  1. Git — versions code and configuration
  2. Artifact store — versions large binaries (model weights, datasets)
  3. Model registry — manages lifecycle stages and deployment metadata

Each layer solves a different problem, and skipping one creates gaps.

DVC for Data and Model Artifacts

Setup

pip install dvc dvc-s3
cd my-ml-project
dvc init
dvc remote add -d myremote s3://my-bucket/dvc-store

Tracking a Model File

dvc add models/classifier.pkl
git add models/classifier.pkl.dvc models/.gitignore
git commit -m "v1: baseline logistic regression"
git tag model-v1
dvc push

The .dvc file stores the MD5 hash of the model binary. Anyone with repo access can run dvc pull to get the exact artifact for any commit.

DVC Pipelines for Reproducibility

# dvc.yaml
stages:
  preprocess:
    cmd: python src/preprocess.py
    deps:
      - src/preprocess.py
      - data/raw/
    outs:
      - data/processed/

  train:
    cmd: python src/train.py
    deps:
      - src/train.py
      - data/processed/
    params:
      - train.learning_rate
      - train.epochs
    outs:
      - models/classifier.pkl
    metrics:
      - metrics.json:
          cache: false

Running dvc repro executes only stages whose dependencies changed. Combined with git tag, this gives you commit-level reproducibility of the entire pipeline.

Comparing Versions

dvc metrics diff model-v1 model-v2

This shows metric deltas between tagged versions without loading the models.

MLflow Model Registry

Registering a Model Programmatically

import mlflow
from mlflow.tracking import MlflowClient

# During training
with mlflow.start_run() as run:
    mlflow.log_params({"lr": 0.01, "epochs": 50})
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_metrics({"accuracy": 0.94, "f1": 0.91})

# Register the model
result = mlflow.register_model(
    model_uri=f"runs:/{run.info.run_id}/model",
    name="fraud-detector"
)
print(f"Registered version: {result.version}")

Promoting Between Stages

client = MlflowClient()

# Move version 3 to production
client.transition_model_version_stage(
    name="fraud-detector",
    version=3,
    stage="Production",
    archive_existing_versions=True  # auto-archive current prod
)

The archive_existing_versions flag ensures only one version serves production at a time — critical for avoiding split-traffic bugs.

Loading the Production Model

import mlflow.pyfunc

model = mlflow.pyfunc.load_model("models:/fraud-detector/Production")
predictions = model.predict(new_data)

This URI scheme (models:/<name>/<stage>) decouples application code from version numbers. The serving layer always loads whatever is currently tagged as Production.

Automated Promotion Gates

Manual promotion is error-prone. Production teams gate promotions with automated checks:

def should_promote(candidate_run_id: str, model_name: str) -> bool:
    client = MlflowClient()
    candidate = client.get_run(candidate_run_id)
    candidate_f1 = float(candidate.data.metrics["f1"])

    # Get current production metrics
    prod_versions = client.get_latest_versions(model_name, stages=["Production"])
    if not prod_versions:
        return True  # No production model yet

    prod_run = client.get_run(prod_versions[0].run_id)
    prod_f1 = float(prod_run.data.metrics["f1"])

    # Require minimum improvement
    return candidate_f1 > prod_f1 + 0.005

This pattern integrates into CI/CD: after training completes, the pipeline calls should_promote() and either transitions the stage or opens a review request.

Versioning Strategies Compared

StrategyTraceabilityLarge File SupportLifecycle ManagementComplexity
Git tags + DVCFull (code + data)Yes (remote storage)ManualMedium
MLflow RegistryRun-level lineageYes (artifact store)Built-in stagesLow
W&B ArtifactsRun-level lineageYes (deduped cloud)Manual or APILow
Custom S3 + metadata DBDepends on implementationYesCustomHigh

Storage and Cleanup

Model artifacts accumulate fast. A team training daily can generate hundreds of gigabytes per month. Strategies:

  • Retention policies — delete artifacts older than 90 days that never reached staging
  • Deduplication — DVC deduplicates by content hash; identical models share storage
  • Tiered storage — move archived models to cheaper storage classes (S3 Glacier, GCS Coldline)
# Cleanup script: delete unregistered runs older than 90 days
from datetime import datetime, timedelta
import mlflow

client = mlflow.MlflowClient()
cutoff = datetime.now() - timedelta(days=90)

for run in client.search_runs(experiment_ids=["1"]):
    run_date = datetime.fromtimestamp(run.info.start_time / 1000)
    if run_date < cutoff:
        # Check if any registered model points to this run
        versions = client.search_model_versions(f"run_id='{run.info.run_id}'")
        if not versions:
            client.delete_run(run.info.run_id)

Tradeoffs

DVC excels when data versioning matters as much as model versioning and the team already uses Git. The downside is operational overhead — teams must remember to dvc push/pull alongside Git operations.

MLflow Registry provides the simplest path to lifecycle management but does not version training data by default. Combining it with DVC or a data catalog covers the gap.

Custom solutions offer maximum flexibility but require maintaining schema migrations, access control, and cleanup logic that managed tools handle automatically.

Real-World Pattern: Immutable Model Packages

Companies like Uber (Michelangelo) and Google (TFX) package models as immutable bundles containing weights, preprocessing code, serving configuration, and a schema contract. Each bundle gets a unique hash and version number. This eliminates “works on my machine” problems because the bundle is self-contained and reproducible.

One thing to remember: Effective model versioning links every trained artifact to its exact code, data, parameters, and metrics — making any historical model reproducible and any production model safely replaceable.

pythonmodel-versioningmlopsmachine-learning

See Also