MLflow Experiment Tracking in Python — Core Concepts

What Is MLflow?

MLflow is an open-source platform for managing the machine learning lifecycle. Its most widely used component is MLflow Tracking, which records experiments so you can compare, reproduce, and share results.

The Core Abstractions

Experiments

An experiment is a named collection of related runs. Think of it as a project folder. You might have one experiment for “customer churn prediction” and another for “image classification.”

Runs

A run is a single execution of your training code. Each run records:

  • Parameters: The inputs to your experiment (learning rate, batch size, number of trees).
  • Metrics: The outputs you are measuring (accuracy, loss, F1 score). Metrics can be logged at multiple steps to track training progress over time.
  • Artifacts: Files produced during the run (trained models, plots, data samples).
  • Tags: Metadata like the author name, dataset version, or Git commit hash.

The Tracking Server

MLflow stores everything either locally (in a mlruns/ folder) or on a remote tracking server. The tracking server provides a web UI where you can browse, filter, and compare runs visually.

How It Works

The typical workflow is:

  1. Start an experiment (or use a default one).
  2. Begin a run.
  3. Log parameters, metrics, and artifacts during training.
  4. End the run.
  5. Use the UI to compare results across runs.

You can log anything: hyperparameters, data file paths, environment details, evaluation charts, even the trained model itself.

Why Teams Need It

Without experiment tracking, common problems include:

  • “Which model version is in production?” — nobody knows.
  • “What hyperparameters gave us 92 percent accuracy last month?” — lost in a notebook somewhere.
  • “Can you reproduce that result?” — not without the exact code, data, and settings.

MLflow solves all three by creating a searchable, versioned record of every experiment.

Key Features

Comparison UI

The web interface lets you select multiple runs and compare their parameters and metrics side by side. You can create scatter plots (accuracy vs. learning rate) and parallel coordinate plots to spot patterns.

Metric History

Log a metric at multiple steps to track training curves:

  • Loss decreasing over epochs signals the model is learning.
  • Validation loss increasing while training loss decreases signals overfitting.

Model Registry

Beyond tracking, MLflow provides a model registry where you can version, stage (Staging → Production), and serve models. This bridges the gap between experimentation and deployment.

Common Misconception

“Experiment tracking is only useful for big teams.” Even solo data scientists benefit enormously. After a week of trying different approaches, it is nearly impossible to remember what you tried, what worked, and why. MLflow makes past-you a reliable collaborator for future-you.

Practical Tips

  • Log everything, even parameters you think do not matter. You can always filter later, but you cannot recover what was never recorded.
  • Use tags to mark important runs (“best_so_far”, “baseline”, “experiment_v2”).
  • Set up a shared tracking server early in team projects so everyone’s runs are in one place.
  • Commit your training script alongside the MLflow run ID for full reproducibility.

One thing to remember: Experiment tracking is not overhead — it is the foundation that makes ML work reproducible, comparable, and trustworthy.

pythonmlflowexperiment-trackingmlops

See Also