Scikit-Learn Model Persistence — Deep Dive
Technical foundation
Model serialization converts a fitted estimator’s in-memory state into a byte stream. Python’s pickle protocol traverses the object graph — following __dict__, __getstate__, and __reduce__ — and records enough information to reconstruct the object. joblib wraps pickle with optimizations for numpy arrays: it memory-maps large arrays and uses efficient compression.
The key constraint: deserialization must import the same classes from the same module paths. If a class moves or changes between library versions, loading fails with AttributeError or produces silently incorrect objects.
joblib: compression and performance
import joblib
from sklearn.ensemble import RandomForestClassifier
import os
model = RandomForestClassifier(n_estimators=500, max_depth=20, random_state=42)
model.fit(X_train, y_train)
# Default: no compression, fastest save/load
joblib.dump(model, 'model.joblib')
print(f"Uncompressed: {os.path.getsize('model.joblib') / 1e6:.1f} MB")
# Compressed: smaller file, slower save/load
joblib.dump(model, 'model_compressed.joblib', compress=3)
print(f"Compressed (zlib-3): {os.path.getsize('model_compressed.joblib') / 1e6:.1f} MB")
# Specific algorithm
joblib.dump(model, 'model_lzma.joblib', compress=('lzma', 3))
print(f"Compressed (lzma-3): {os.path.getsize('model_lzma.joblib') / 1e6:.1f} MB")
Compression benchmarks for a 500-tree Random Forest trained on 50K samples:
| Method | File Size | Save Time | Load Time |
|---|---|---|---|
| No compression | ~180 MB | 0.4s | 0.3s |
| zlib level 3 | ~45 MB | 1.2s | 0.6s |
| lzma level 3 | ~25 MB | 8.0s | 1.5s |
Rule of thumb: Use compress=3 (zlib) for production — good size reduction with acceptable speed. Use lzma only for archival where load speed doesn’t matter.
Complete model packaging
Save everything needed to reproduce and serve:
import json
from datetime import datetime
import sklearn
import numpy as np
def save_model_package(pipeline, X_train, y_train, metrics, path_prefix):
"""Save model with complete metadata for reproducible deployment."""
# Save the fitted pipeline
model_path = f"{path_prefix}_model.joblib"
joblib.dump(pipeline, model_path, compress=3)
# Save metadata
metadata = {
'created_at': datetime.utcnow().isoformat(),
'sklearn_version': sklearn.__version__,
'python_version': f"{__import__('sys').version}",
'numpy_version': np.__version__,
'n_training_samples': len(y_train),
'n_features': X_train.shape[1],
'feature_names': list(X_train.columns) if hasattr(X_train, 'columns') else None,
'class_distribution': dict(zip(*np.unique(y_train, return_counts=True))),
'metrics': metrics,
'model_type': type(pipeline).__name__,
'model_params': pipeline.get_params(),
'file_size_bytes': os.path.getsize(model_path),
}
meta_path = f"{path_prefix}_metadata.json"
with open(meta_path, 'w') as f:
json.dump(metadata, f, indent=2, default=str)
# Save a test prediction for validation
test_input = X_train.iloc[:5] if hasattr(X_train, 'iloc') else X_train[:5]
test_output = pipeline.predict(test_input)
validation = {
'input_shape': list(test_input.shape),
'expected_output': test_output.tolist(),
}
val_path = f"{path_prefix}_validation.json"
with open(val_path, 'w') as f:
json.dump(validation, f, indent=2)
return model_path, meta_path, val_path
def load_and_validate(path_prefix):
"""Load model and verify it produces expected outputs."""
model = joblib.load(f"{path_prefix}_model.joblib")
with open(f"{path_prefix}_metadata.json") as f:
metadata = json.load(f)
with open(f"{path_prefix}_validation.json") as f:
validation = json.load(f)
# Version check
if metadata['sklearn_version'] != sklearn.__version__:
print(f"WARNING: Model trained with sklearn {metadata['sklearn_version']}, "
f"current version is {sklearn.__version__}")
return model, metadata
ONNX export for cross-platform serving
For production inference outside Python (C++, Java, JavaScript, Rust), export to ONNX:
# pip install skl2onnx onnxruntime
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import onnxruntime as rt
# Convert sklearn model to ONNX
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]
onnx_model = convert_sklearn(pipeline, initial_types=initial_type)
# Save ONNX model
with open('model.onnx', 'wb') as f:
f.write(onnx_model.SerializeToString())
# Inference with ONNX Runtime (no sklearn dependency needed)
session = rt.InferenceSession('model.onnx')
input_name = session.get_inputs()[0].name
predictions = session.run(None, {input_name: X_test.astype(np.float32)})[0]
ONNX advantages:
- No Python dependency at inference: Serve from C++, Go, Rust, or edge devices
- Optimized runtime: ONNX Runtime applies graph optimizations (operator fusion, memory planning)
- Deterministic: No version compatibility issues between training and serving environments
Limitation: Not all scikit-learn transformers have ONNX converters. Complex custom transformers may need manual ONNX operator implementations.
skops.io: secure serialization
Scikit-learn recommends skops.io for security-conscious workflows:
# pip install skops
import skops.io as sio
# Save (produces a .skops file)
sio.dump(pipeline, 'model.skops')
# Load with type validation
# Only allows loading specific trusted types
unknown_types = sio.get_untrusted_types(file='model.skops')
print(f"Types in file: {unknown_types}")
# Explicitly trust the types found
loaded = sio.load('model.skops', trusted=unknown_types)
Unlike pickle, skops.io inspects the serialized types before instantiating them. You explicitly approve which types can be loaded, preventing arbitrary code execution from malicious files.
Version compatibility strategies
Strategy 1: Pin versions in requirements
# requirements-model.txt
scikit-learn==1.4.2
numpy==1.26.4
joblib==1.3.2
Strategy 2: Version-aware loading
def safe_load(model_path, metadata_path):
with open(metadata_path) as f:
meta = json.load(f)
trained_version = tuple(int(x) for x in meta['sklearn_version'].split('.'))
current_version = tuple(int(x) for x in sklearn.__version__.split('.'))
if trained_version[0] != current_version[0]:
raise RuntimeError(
f"Major version mismatch: trained on {meta['sklearn_version']}, "
f"running {sklearn.__version__}. Retrain required."
)
if trained_version[:2] != current_version[:2]:
import warnings
warnings.warn(
f"Minor version mismatch: {meta['sklearn_version']} vs {sklearn.__version__}. "
f"Validate predictions before deploying."
)
return joblib.load(model_path)
Strategy 3: Export model parameters
For simple models, export learned parameters as JSON (version-independent):
def export_linear_model(model):
"""Export linear model as portable JSON."""
return {
'coefficients': model.coef_.tolist(),
'intercept': model.intercept_.tolist(),
'classes': model.classes_.tolist(),
}
def predict_from_params(params, X):
"""Reconstruct predictions without sklearn."""
X = np.asarray(X)
coef = np.array(params['coefficients'])
intercept = np.array(params['intercept'])
scores = X @ coef.T + intercept
class_indices = np.argmax(scores, axis=1)
return np.array(params['classes'])[class_indices]
Model registry pattern
import hashlib
from pathlib import Path
class ModelRegistry:
"""Simple file-based model registry with versioning."""
def __init__(self, registry_dir='models'):
self.dir = Path(registry_dir)
self.dir.mkdir(exist_ok=True)
def register(self, model, name, metrics, X_train, y_train):
# Generate version hash from model params + data shape
param_str = str(sorted(model.get_params().items()))
version = hashlib.sha256(param_str.encode()).hexdigest()[:8]
model_dir = self.dir / name / version
model_dir.mkdir(parents=True, exist_ok=True)
save_model_package(model, X_train, y_train, metrics, str(model_dir / 'model'))
# Update 'latest' symlink
latest = self.dir / name / 'latest'
if latest.is_symlink():
latest.unlink()
latest.symlink_to(version)
return f"{name}/{version}"
def load(self, name, version='latest'):
model_dir = self.dir / name / version
if model_dir.is_symlink():
model_dir = model_dir.resolve()
return load_and_validate(str(model_dir / 'model'))
registry = ModelRegistry()
model_id = registry.register(pipeline, 'fraud-detector', {'f1': 0.89}, X_train, y_train)
model, metadata = registry.load('fraud-detector')
Memory-mapped loading for large models
For models larger than available RAM (rare with sklearn, common with large ensembles):
# Save with memory-mapping support
joblib.dump(large_model, 'large_model.joblib')
# Load with memory mapping — arrays stay on disk, accessed on demand
loaded = joblib.load('large_model.joblib', mmap_mode='r')
# Predictions work normally but array access reads from disk
predictions = loaded.predict(X_test)
Memory mapping is read-only (mmap_mode='r') — you can’t modify the loaded model. This is ideal for inference servers where multiple processes can share the same memory-mapped file.
Tradeoffs
| Method | Security | Portability | Speed | Size |
|---|---|---|---|---|
| joblib | Low (arbitrary exec) | Python only | Fast | Medium |
| pickle | Low (arbitrary exec) | Python only | Fast | Large |
| skops.io | High (type validation) | Python only | Fast | Medium |
| ONNX | High (no code exec) | Cross-platform | Fastest inference | Small |
| JSON params | High | Universal | Manual reconstruction | Tiny |
One thing to remember: In production, model persistence is not just joblib.dump — it’s metadata, version tracking, security validation, and reproducibility guarantees. The model file is the smallest part of the deployment problem.
See Also
- Python Ab Testing Ml Models Why taste-testing two cookie recipes with different friends is the fairest way to pick a winner.
- Python Feature Store Design Why a shared ingredient pantry saves every cook in the kitchen from buying the same spices over and over.
- Python Ml Pipeline Orchestration Why a factory assembly line needs a foreman to make sure every step happens in the right order at the right time.
- Python Mlflow Experiment Tracking Find out why writing down every cooking experiment helps you recreate the perfect recipe every time.
- Python Model Explainability Shap How asking 'why did you pick that answer?' turns a mysterious black box into something you can actually trust.