Confusion Matrix in Python — Core Concepts

What Is a Confusion Matrix?

A confusion matrix is a table that shows how a classification model’s predictions compare to the actual answers. For a binary problem (two classes), it is a 2×2 grid. For multi-class problems, it grows accordingly.

The Four Quadrants (Binary Case)

Predicted PositivePredicted Negative
Actually PositiveTrue Positive (TP)False Negative (FN)
Actually NegativeFalse Positive (FP)True Negative (TN)
  • True Positive: The model said “yes” and was right.
  • True Negative: The model said “no” and was right.
  • False Positive: The model said “yes” but was wrong (false alarm).
  • False Negative: The model said “no” but was wrong (missed case).

Reading the Matrix

Suppose a medical test for a disease produces this confusion matrix on 1,000 patients:

Predicted SickPredicted Healthy
Actually Sick455
Actually Healthy50900

From this single table you can calculate:

  • Accuracy: (45 + 900) / 1,000 = 94.5%
  • Precision: 45 / (45 + 50) = 47.4% — when the test says “sick,” it is wrong more than half the time.
  • Recall: 45 / (45 + 5) = 90% — it catches 90 percent of sick patients.

Accuracy looks great at 94.5 percent, but precision reveals that most positive predictions are false alarms. The confusion matrix exposed a problem that accuracy alone would hide.

How It Works for Multiple Classes

With three classes (dog, cat, bird), the matrix becomes 3×3. Each cell (i, j) shows how many samples from class i were predicted as class j. The diagonal shows correct predictions. Off-diagonal cells show specific confusions.

Predicted DogPredicted CatPredicted Bird
Actual Dog4082
Actual Cat5423
Actual Bird1445

This instantly reveals that dogs and cats are confused with each other more than either is confused with birds.

Normalized Confusion Matrix

Raw counts can be misleading when classes have different sizes. Normalizing by row (dividing each row by its total) gives percentages:

  • Actual Dog row: 40/50 = 80% correct, 16% confused with cat.
  • Actual Cat row: 42/50 = 84% correct, 10% confused with dog.

Normalized matrices make comparison easier across imbalanced classes.

Common Misconception

“A confusion matrix only works for two classes.” It works for any number of classes. In fact, it becomes even more valuable with many classes because it pinpoints exactly which pairs of classes are being confused — something a single F1 score cannot show.

When to Use It

  • After training any classifier, before reporting results.
  • When stakeholders ask “where is the model failing?”
  • When you need to decide whether to tune for fewer false positives or fewer false negatives.
  • When debugging a model that has high accuracy but poor performance on a specific class.

Practical Tips

  • Always inspect the confusion matrix visually, not just the summary scores.
  • Look at the off-diagonal cells to identify problematic class pairs.
  • Use a heatmap for matrices larger than 3×3 — patterns jump out more quickly.
  • Combine the confusion matrix with a classification report (precision, recall, F1 per class) for a complete picture.

One thing to remember: The confusion matrix is the most honest report card in machine learning — it hides nothing about where your model succeeds and where it struggles.

pythonconfusion-matrixmachine-learningclassification

See Also

  • Python Cross Validation Find out why testing a computer's homework on different practice sets keeps it from cheating.
  • Python Model Evaluation Metrics Discover why asking 'how good is my model?' needs more than one number to get an honest answer.
  • Python Roc Auc Curves Understand how one picture and one number tell you whether a computer's predictions are trustworthy or just lucky guesses.
  • Python Sklearn Learning Curves Why your machine learning model might need more data — or a simpler brain — explained with zero jargon.
  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.