Model Registry Patterns in Python — Deep Dive

Registry Architecture

A production model registry has four layers:

  1. Metadata store — database holding model names, versions, stages, tags, and lineage
  2. Artifact store — blob storage (S3, GCS) holding serialized model weights
  3. API layer — REST or gRPC interface for registration, queries, and transitions
  4. Governance layer — access control, approval workflows, and audit logs

MLflow bundles all four. Custom registries often split them across dedicated services.

MLflow Registry: Advanced Patterns

Registering with Schema Validation

import mlflow
from mlflow.models.signature import infer_signature

with mlflow.start_run():
    signature = infer_signature(X_train, model.predict(X_train))
    mlflow.sklearn.log_model(
        model, "model",
        signature=signature,
        input_example=X_train[:5]
    )
    result = mlflow.register_model(
        f"runs:/{mlflow.active_run().info.run_id}/model",
        "credit-scorer"
    )

The signature enforces column names and types at serving time. If incoming data does not match, the model server rejects the request — catching schema drift before it produces garbage predictions.

Automated Promotion with Webhooks

MLflow supports webhooks that fire on model version events:

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Create a webhook that fires on model version creation
client.create_registry_webhook(
    events=["MODEL_VERSION_CREATED"],
    description="Trigger validation pipeline",
    http_url_spec={
        "url": "https://ci.example.com/validate-model",
        "secret": "webhook-secret-token"
    }
)

The CI endpoint receives the model name and version, runs validation tests (accuracy thresholds, latency benchmarks, bias checks), and promotes or rejects automatically.

Multi-Model Dependencies

Some serving graphs chain models — a feature encoder feeds into a classifier that feeds into a ranker. The registry should track these dependencies:

client.set_model_version_tag(
    name="product-ranker",
    version="5",
    key="depends_on",
    value="feature-encoder:3,click-classifier:7"
)

This enables impact analysis: before archiving feature-encoder:3, query all models that depend on it.

Building a Custom Registry

When MLflow’s model does not fit — for example, you need multi-tenant isolation or custom approval workflows — a lightweight custom registry can work:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from datetime import datetime
import hashlib
import json

app = FastAPI()

class ModelVersion(BaseModel):
    name: str
    version: int
    artifact_uri: str
    metrics: dict
    schema_input: dict
    schema_output: dict
    stage: str = "none"
    created_by: str
    created_at: datetime = None
    artifact_hash: str = None

# In-memory for illustration; use PostgreSQL in production
registry: dict[str, list[ModelVersion]] = {}

@app.post("/models/register")
def register_model(mv: ModelVersion):
    mv.created_at = datetime.utcnow()
    if mv.name not in registry:
        registry[mv.name] = []

    # Check for duplicate artifact
    if any(v.artifact_hash == mv.artifact_hash for v in registry[mv.name]):
        raise HTTPException(400, "Duplicate artifact already registered")

    registry[mv.name].append(mv)
    return {"name": mv.name, "version": mv.version, "stage": mv.stage}

@app.post("/models/{name}/versions/{version}/promote")
def promote(name: str, version: int, target_stage: str):
    if name not in registry:
        raise HTTPException(404, "Model not found")

    valid_transitions = {
        "none": ["staging"],
        "staging": ["production", "archived"],
        "production": ["archived"],
    }

    for mv in registry[name]:
        if mv.version == version:
            if target_stage not in valid_transitions.get(mv.stage, []):
                raise HTTPException(
                    400,
                    f"Cannot transition from {mv.stage} to {target_stage}"
                )
            # Archive current production if promoting to production
            if target_stage == "production":
                for other in registry[name]:
                    if other.stage == "production":
                        other.stage = "archived"
            mv.stage = target_stage
            return {"name": name, "version": version, "stage": target_stage}

    raise HTTPException(404, "Version not found")

This skeleton enforces valid stage transitions and prevents duplicate registrations. A production version would add PostgreSQL persistence, JWT authentication, and event emission for downstream automation.

Schema Contracts

Serving failures often trace back to schema mismatches. Registries should store and enforce input/output schemas:

from pydantic import BaseModel
from typing import Optional

class ModelSchema(BaseModel):
    input_columns: list[dict]  # [{"name": "age", "type": "float64"}, ...]
    output_columns: list[dict]
    feature_ranges: Optional[dict] = None  # expected min/max per feature

def validate_serving_request(request_df, schema: ModelSchema) -> list[str]:
    errors = []
    expected_cols = {col["name"] for col in schema.input_columns}
    actual_cols = set(request_df.columns)

    missing = expected_cols - actual_cols
    if missing:
        errors.append(f"Missing columns: {missing}")

    extra = actual_cols - expected_cols
    if extra:
        errors.append(f"Unexpected columns: {extra}")

    for col_spec in schema.input_columns:
        col_name = col_spec["name"]
        if col_name in request_df.columns:
            actual_type = str(request_df[col_name].dtype)
            if actual_type != col_spec["type"]:
                errors.append(
                    f"Column {col_name}: expected {col_spec['type']}, "
                    f"got {actual_type}"
                )
    return errors

Governance Patterns

Approval Workflows

In regulated industries (finance, healthcare), no model reaches production without human approval. The registry tracks approval state:

  • Pending — awaiting review
  • Approved — human sign-off recorded
  • Rejected — blocked with reason

Each approval records the reviewer’s identity and timestamp, creating an audit trail for compliance.

Model Cards

Attaching a model card (inspired by Google’s Model Cards for Model Reporting) to each registered version documents intended use, known limitations, fairness evaluations, and training data descriptions. This is not optional in many regulatory frameworks.

model_card = {
    "intended_use": "Credit scoring for consumer loans",
    "limitations": "Trained on US data only; may not generalize internationally",
    "fairness_evaluation": {
        "demographic_parity": 0.92,
        "equalized_odds": 0.89
    },
    "training_data": "2024 Q1-Q3 loan applications, 2.1M records"
}

client.set_model_version_tag(
    name="credit-scorer", version="12",
    key="model_card", value=json.dumps(model_card)
)

Comparison of Registry Tools

FeatureMLflowW&BVertex AISageMakerCustom
Stage managementBuilt-inTagsAutomaticApproval groupsCustom
Schema validationSignatureArtifact typesSchema specInference specCustom
Multi-tenantLimitedWorkspace-levelProject-levelAccount-levelFull control
Approval workflowsWebhooksManualBuilt-inBuilt-inCustom
CostFree (OSS)Free tier + paidPer-predictionPer-endpointEngineering time

Cleanup and Retention

Registered models accumulate. A retention policy should:

  1. Never delete production models until their replacement has served stable for N days
  2. Archive staging models that were not promoted within 30 days
  3. Retain artifacts for archived models for regulatory hold periods (often 7 years in finance)
  4. Compact metadata — delete individual run metrics but keep registered version summaries

One thing to remember: A model registry is governance infrastructure — it enforces who can deploy what, tracks where every model came from, and enables instant rollback when production goes wrong.

pythonmodel-registrymlopsmachine-learning

See Also