Scikit-Learn Learning Curves — Core Concepts

Why learning curves matter

Every machine learning project faces a fork: should you gather more data or try a different model? Learning curves answer that question empirically by plotting model performance against the number of training samples.

Without this diagnostic, teams routinely spend weeks collecting data that doesn’t improve results, or swap models when the real bottleneck is sample size.

How learning curves work

A learning curve trains the same model on progressively larger subsets of data — say 10%, 20%, 40%, 60%, 80%, and 100% of the training set. At each size, it records two scores:

  • Training score — how well the model fits the data it learned from
  • Validation score — how well the model generalizes to unseen data

These two lines, plotted together, create diagnostic patterns.

Three patterns to recognize

High bias (underfitting): Both training and validation scores are low and converge early. The model is too simple to capture the underlying signal. More data won’t help — you need a more expressive model, additional features, or less aggressive regularization.

High variance (overfitting): The training score is near-perfect but the validation score lags far behind. There’s a persistent gap between the curves. More training data typically closes this gap. Alternatively, simplify the model or add regularization.

Good fit: Both scores are reasonably high, the gap between them is small, and the validation curve has plateaued. This is the sweet spot where additional data yields diminishing returns and the model generalizes well.

Using scikit-learn’s learning_curve function

Scikit-learn provides sklearn.model_selection.learning_curve, which handles the mechanics: splitting data into subsets, cross-validating at each size, and returning arrays of scores.

Key parameters include:

  • estimator — any scikit-learn model or pipeline
  • train_sizes — fractions or absolute numbers of training examples to evaluate
  • cv — the cross-validation strategy (e.g., 5-fold)
  • scoring — the metric to optimize (accuracy, F1, R², etc.)

The function returns training sizes used, training scores, and test scores — ready for plotting.

Common misconception

Many practitioners assume a flat validation curve always means “model is perfect.” In reality, it can also mean the model plateaued at a mediocre score and needs architectural changes. Always check where the curve plateaued, not just that it stopped moving.

When to use learning curves

  • Before collecting expensive new data — check if more samples will actually help
  • During model selection — compare how different models respond to data volume
  • After feature engineering — verify that new features reduced the bias-variance gap
  • In production monitoring — detect when retraining with fresh data stops improving performance

One thing to remember: The gap between training and validation curves is the story. A shrinking gap means more data is working. A stubborn gap means the model needs structural change.

pythonmachine-learningscikit-learn

See Also

  • Python Confusion Matrix See how a simple grid of right and wrong answers reveals what your computer is actually getting confused about.
  • Python Cross Validation Find out why testing a computer's homework on different practice sets keeps it from cheating.
  • Python Model Evaluation Metrics Discover why asking 'how good is my model?' needs more than one number to get an honest answer.
  • Python Roc Auc Curves Understand how one picture and one number tell you whether a computer's predictions are trustworthy or just lucky guesses.
  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.