Model Versioning in Python — Deep Dive
The Versioning Stack
Production ML teams rarely use a single tool. A robust versioning strategy layers three systems:
- Git — versions code and configuration
- Artifact store — versions large binaries (model weights, datasets)
- Model registry — manages lifecycle stages and deployment metadata
Each layer solves a different problem, and skipping one creates gaps.
DVC for Data and Model Artifacts
Setup
pip install dvc dvc-s3
cd my-ml-project
dvc init
dvc remote add -d myremote s3://my-bucket/dvc-store
Tracking a Model File
dvc add models/classifier.pkl
git add models/classifier.pkl.dvc models/.gitignore
git commit -m "v1: baseline logistic regression"
git tag model-v1
dvc push
The .dvc file stores the MD5 hash of the model binary. Anyone with repo access can run dvc pull to get the exact artifact for any commit.
DVC Pipelines for Reproducibility
# dvc.yaml
stages:
preprocess:
cmd: python src/preprocess.py
deps:
- src/preprocess.py
- data/raw/
outs:
- data/processed/
train:
cmd: python src/train.py
deps:
- src/train.py
- data/processed/
params:
- train.learning_rate
- train.epochs
outs:
- models/classifier.pkl
metrics:
- metrics.json:
cache: false
Running dvc repro executes only stages whose dependencies changed. Combined with git tag, this gives you commit-level reproducibility of the entire pipeline.
Comparing Versions
dvc metrics diff model-v1 model-v2
This shows metric deltas between tagged versions without loading the models.
MLflow Model Registry
Registering a Model Programmatically
import mlflow
from mlflow.tracking import MlflowClient
# During training
with mlflow.start_run() as run:
mlflow.log_params({"lr": 0.01, "epochs": 50})
mlflow.sklearn.log_model(model, "model")
mlflow.log_metrics({"accuracy": 0.94, "f1": 0.91})
# Register the model
result = mlflow.register_model(
model_uri=f"runs:/{run.info.run_id}/model",
name="fraud-detector"
)
print(f"Registered version: {result.version}")
Promoting Between Stages
client = MlflowClient()
# Move version 3 to production
client.transition_model_version_stage(
name="fraud-detector",
version=3,
stage="Production",
archive_existing_versions=True # auto-archive current prod
)
The archive_existing_versions flag ensures only one version serves production at a time — critical for avoiding split-traffic bugs.
Loading the Production Model
import mlflow.pyfunc
model = mlflow.pyfunc.load_model("models:/fraud-detector/Production")
predictions = model.predict(new_data)
This URI scheme (models:/<name>/<stage>) decouples application code from version numbers. The serving layer always loads whatever is currently tagged as Production.
Automated Promotion Gates
Manual promotion is error-prone. Production teams gate promotions with automated checks:
def should_promote(candidate_run_id: str, model_name: str) -> bool:
client = MlflowClient()
candidate = client.get_run(candidate_run_id)
candidate_f1 = float(candidate.data.metrics["f1"])
# Get current production metrics
prod_versions = client.get_latest_versions(model_name, stages=["Production"])
if not prod_versions:
return True # No production model yet
prod_run = client.get_run(prod_versions[0].run_id)
prod_f1 = float(prod_run.data.metrics["f1"])
# Require minimum improvement
return candidate_f1 > prod_f1 + 0.005
This pattern integrates into CI/CD: after training completes, the pipeline calls should_promote() and either transitions the stage or opens a review request.
Versioning Strategies Compared
| Strategy | Traceability | Large File Support | Lifecycle Management | Complexity |
|---|---|---|---|---|
| Git tags + DVC | Full (code + data) | Yes (remote storage) | Manual | Medium |
| MLflow Registry | Run-level lineage | Yes (artifact store) | Built-in stages | Low |
| W&B Artifacts | Run-level lineage | Yes (deduped cloud) | Manual or API | Low |
| Custom S3 + metadata DB | Depends on implementation | Yes | Custom | High |
Storage and Cleanup
Model artifacts accumulate fast. A team training daily can generate hundreds of gigabytes per month. Strategies:
- Retention policies — delete artifacts older than 90 days that never reached staging
- Deduplication — DVC deduplicates by content hash; identical models share storage
- Tiered storage — move archived models to cheaper storage classes (S3 Glacier, GCS Coldline)
# Cleanup script: delete unregistered runs older than 90 days
from datetime import datetime, timedelta
import mlflow
client = mlflow.MlflowClient()
cutoff = datetime.now() - timedelta(days=90)
for run in client.search_runs(experiment_ids=["1"]):
run_date = datetime.fromtimestamp(run.info.start_time / 1000)
if run_date < cutoff:
# Check if any registered model points to this run
versions = client.search_model_versions(f"run_id='{run.info.run_id}'")
if not versions:
client.delete_run(run.info.run_id)
Tradeoffs
DVC excels when data versioning matters as much as model versioning and the team already uses Git. The downside is operational overhead — teams must remember to dvc push/pull alongside Git operations.
MLflow Registry provides the simplest path to lifecycle management but does not version training data by default. Combining it with DVC or a data catalog covers the gap.
Custom solutions offer maximum flexibility but require maintaining schema migrations, access control, and cleanup logic that managed tools handle automatically.
Real-World Pattern: Immutable Model Packages
Companies like Uber (Michelangelo) and Google (TFX) package models as immutable bundles containing weights, preprocessing code, serving configuration, and a schema contract. Each bundle gets a unique hash and version number. This eliminates “works on my machine” problems because the bundle is self-contained and reproducible.
One thing to remember: Effective model versioning links every trained artifact to its exact code, data, parameters, and metrics — making any historical model reproducible and any production model safely replaceable.
See Also
- Python Ab Testing Ml Models Why taste-testing two cookie recipes with different friends is the fairest way to pick a winner.
- Python Feature Store Design Why a shared ingredient pantry saves every cook in the kitchen from buying the same spices over and over.
- Python Ml Pipeline Orchestration Why a factory assembly line needs a foreman to make sure every step happens in the right order at the right time.
- Python Mlflow Experiment Tracking Find out why writing down every cooking experiment helps you recreate the perfect recipe every time.
- Python Model Explainability Shap How asking 'why did you pick that answer?' turns a mysterious black box into something you can actually trust.