Model Registry Patterns in Python — Deep Dive
Registry Architecture
A production model registry has four layers:
- Metadata store — database holding model names, versions, stages, tags, and lineage
- Artifact store — blob storage (S3, GCS) holding serialized model weights
- API layer — REST or gRPC interface for registration, queries, and transitions
- Governance layer — access control, approval workflows, and audit logs
MLflow bundles all four. Custom registries often split them across dedicated services.
MLflow Registry: Advanced Patterns
Registering with Schema Validation
import mlflow
from mlflow.models.signature import infer_signature
with mlflow.start_run():
signature = infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(
model, "model",
signature=signature,
input_example=X_train[:5]
)
result = mlflow.register_model(
f"runs:/{mlflow.active_run().info.run_id}/model",
"credit-scorer"
)
The signature enforces column names and types at serving time. If incoming data does not match, the model server rejects the request — catching schema drift before it produces garbage predictions.
Automated Promotion with Webhooks
MLflow supports webhooks that fire on model version events:
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Create a webhook that fires on model version creation
client.create_registry_webhook(
events=["MODEL_VERSION_CREATED"],
description="Trigger validation pipeline",
http_url_spec={
"url": "https://ci.example.com/validate-model",
"secret": "webhook-secret-token"
}
)
The CI endpoint receives the model name and version, runs validation tests (accuracy thresholds, latency benchmarks, bias checks), and promotes or rejects automatically.
Multi-Model Dependencies
Some serving graphs chain models — a feature encoder feeds into a classifier that feeds into a ranker. The registry should track these dependencies:
client.set_model_version_tag(
name="product-ranker",
version="5",
key="depends_on",
value="feature-encoder:3,click-classifier:7"
)
This enables impact analysis: before archiving feature-encoder:3, query all models that depend on it.
Building a Custom Registry
When MLflow’s model does not fit — for example, you need multi-tenant isolation or custom approval workflows — a lightweight custom registry can work:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from datetime import datetime
import hashlib
import json
app = FastAPI()
class ModelVersion(BaseModel):
name: str
version: int
artifact_uri: str
metrics: dict
schema_input: dict
schema_output: dict
stage: str = "none"
created_by: str
created_at: datetime = None
artifact_hash: str = None
# In-memory for illustration; use PostgreSQL in production
registry: dict[str, list[ModelVersion]] = {}
@app.post("/models/register")
def register_model(mv: ModelVersion):
mv.created_at = datetime.utcnow()
if mv.name not in registry:
registry[mv.name] = []
# Check for duplicate artifact
if any(v.artifact_hash == mv.artifact_hash for v in registry[mv.name]):
raise HTTPException(400, "Duplicate artifact already registered")
registry[mv.name].append(mv)
return {"name": mv.name, "version": mv.version, "stage": mv.stage}
@app.post("/models/{name}/versions/{version}/promote")
def promote(name: str, version: int, target_stage: str):
if name not in registry:
raise HTTPException(404, "Model not found")
valid_transitions = {
"none": ["staging"],
"staging": ["production", "archived"],
"production": ["archived"],
}
for mv in registry[name]:
if mv.version == version:
if target_stage not in valid_transitions.get(mv.stage, []):
raise HTTPException(
400,
f"Cannot transition from {mv.stage} to {target_stage}"
)
# Archive current production if promoting to production
if target_stage == "production":
for other in registry[name]:
if other.stage == "production":
other.stage = "archived"
mv.stage = target_stage
return {"name": name, "version": version, "stage": target_stage}
raise HTTPException(404, "Version not found")
This skeleton enforces valid stage transitions and prevents duplicate registrations. A production version would add PostgreSQL persistence, JWT authentication, and event emission for downstream automation.
Schema Contracts
Serving failures often trace back to schema mismatches. Registries should store and enforce input/output schemas:
from pydantic import BaseModel
from typing import Optional
class ModelSchema(BaseModel):
input_columns: list[dict] # [{"name": "age", "type": "float64"}, ...]
output_columns: list[dict]
feature_ranges: Optional[dict] = None # expected min/max per feature
def validate_serving_request(request_df, schema: ModelSchema) -> list[str]:
errors = []
expected_cols = {col["name"] for col in schema.input_columns}
actual_cols = set(request_df.columns)
missing = expected_cols - actual_cols
if missing:
errors.append(f"Missing columns: {missing}")
extra = actual_cols - expected_cols
if extra:
errors.append(f"Unexpected columns: {extra}")
for col_spec in schema.input_columns:
col_name = col_spec["name"]
if col_name in request_df.columns:
actual_type = str(request_df[col_name].dtype)
if actual_type != col_spec["type"]:
errors.append(
f"Column {col_name}: expected {col_spec['type']}, "
f"got {actual_type}"
)
return errors
Governance Patterns
Approval Workflows
In regulated industries (finance, healthcare), no model reaches production without human approval. The registry tracks approval state:
- Pending — awaiting review
- Approved — human sign-off recorded
- Rejected — blocked with reason
Each approval records the reviewer’s identity and timestamp, creating an audit trail for compliance.
Model Cards
Attaching a model card (inspired by Google’s Model Cards for Model Reporting) to each registered version documents intended use, known limitations, fairness evaluations, and training data descriptions. This is not optional in many regulatory frameworks.
model_card = {
"intended_use": "Credit scoring for consumer loans",
"limitations": "Trained on US data only; may not generalize internationally",
"fairness_evaluation": {
"demographic_parity": 0.92,
"equalized_odds": 0.89
},
"training_data": "2024 Q1-Q3 loan applications, 2.1M records"
}
client.set_model_version_tag(
name="credit-scorer", version="12",
key="model_card", value=json.dumps(model_card)
)
Comparison of Registry Tools
| Feature | MLflow | W&B | Vertex AI | SageMaker | Custom |
|---|---|---|---|---|---|
| Stage management | Built-in | Tags | Automatic | Approval groups | Custom |
| Schema validation | Signature | Artifact types | Schema spec | Inference spec | Custom |
| Multi-tenant | Limited | Workspace-level | Project-level | Account-level | Full control |
| Approval workflows | Webhooks | Manual | Built-in | Built-in | Custom |
| Cost | Free (OSS) | Free tier + paid | Per-prediction | Per-endpoint | Engineering time |
Cleanup and Retention
Registered models accumulate. A retention policy should:
- Never delete production models until their replacement has served stable for N days
- Archive staging models that were not promoted within 30 days
- Retain artifacts for archived models for regulatory hold periods (often 7 years in finance)
- Compact metadata — delete individual run metrics but keep registered version summaries
One thing to remember: A model registry is governance infrastructure — it enforces who can deploy what, tracks where every model came from, and enables instant rollback when production goes wrong.
See Also
- Python Ab Testing Ml Models Why taste-testing two cookie recipes with different friends is the fairest way to pick a winner.
- Python Feature Store Design Why a shared ingredient pantry saves every cook in the kitchen from buying the same spices over and over.
- Python Ml Pipeline Orchestration Why a factory assembly line needs a foreman to make sure every step happens in the right order at the right time.
- Python Mlflow Experiment Tracking Find out why writing down every cooking experiment helps you recreate the perfect recipe every time.
- Python Model Explainability Shap How asking 'why did you pick that answer?' turns a mysterious black box into something you can actually trust.