Anomaly Detection with Python — Core Concepts

What makes anomaly detection different from classification

In classification, you have labeled examples of each category — spam vs. not-spam, cat vs. dog. In anomaly detection, you typically have mountains of normal data and very few (or zero) examples of anomalies. This asymmetry changes the entire approach. You model what normal looks like and flag deviations.

Types of anomalies

  • Point anomalies: a single data point is abnormal (a $50,000 transaction on a card that usually sees $50 purchases).
  • Contextual anomalies: normal in one context, abnormal in another (80°F in July is fine; 80°F in January in Chicago is not).
  • Collective anomalies: a sequence of points that is abnormal as a group, even if each individual point seems fine (a server making 100 requests per second is normal, but making exactly 100 per second for 24 hours straight is not).

The main algorithms

Statistical methods — Z-score and IQR

The simplest approach: compute how far each point is from the mean. Points beyond a threshold (typically 3 standard deviations) are flagged.

import numpy as np

def zscore_anomalies(data: np.ndarray, threshold: float = 3.0) -> np.ndarray:
    mean = np.mean(data)
    std = np.std(data)
    z_scores = np.abs((data - mean) / std)
    return z_scores > threshold

Works well for simple, normally distributed data. Breaks down with skewed distributions, multiple modes, or high-dimensional data.

Isolation Forest

Instead of modeling normal data, Isolation Forest isolates anomalies directly. The logic: anomalies are rare and different, so they require fewer random splits to isolate in a tree structure.

from sklearn.ensemble import IsolationForest

model = IsolationForest(contamination=0.02, random_state=42)
model.fit(data)
predictions = model.predict(data)  # -1 = anomaly, 1 = normal
scores = model.decision_function(data)  # lower = more anomalous

Isolation Forest handles high-dimensional data well and does not assume any particular distribution. It is the go-to choice for many production systems.

Local Outlier Factor (LOF)

LOF compares the density of points around each observation to the density around its neighbors. Points in sparse regions surrounded by dense neighborhoods are flagged as outliers.

from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.02)
predictions = lof.fit_predict(data)  # -1 = anomaly

LOF excels at detecting local anomalies — points that are normal globally but unusual in their local neighborhood. It struggles with very high-dimensional data due to the curse of dimensionality.

Autoencoders (deep learning)

Train a neural network to compress and reconstruct normal data. Anomalies produce high reconstruction error because the model never learned to handle them:

from sklearn.preprocessing import StandardScaler

# After training an autoencoder on normal data:
reconstructed = autoencoder.predict(test_data)
reconstruction_error = np.mean((test_data - reconstructed) ** 2, axis=1)
anomalies = reconstruction_error > threshold

Autoencoders are powerful for complex, high-dimensional data (network traffic, sensor readings) but require more data and tuning than simpler methods.

Choosing the right method

ScenarioRecommended approachWhy
Low-dimensional, clean dataZ-score or IQRSimple, interpretable
Tabular data, unknown distributionIsolation ForestNo distribution assumptions, fast
Density varies across regionsLocal Outlier FactorCaptures local context
High-dimensional, complex patternsAutoencoderLearns nonlinear representations
Time series dataStatistical process control or LSTMRespects temporal ordering

The contamination problem

Most algorithms need a contamination parameter — your estimate of what fraction of data is anomalous. If you guess 1% but the true rate is 5%, you will miss many anomalies. If you guess 10% when the true rate is 0.1%, you will drown in false positives.

When you do not know the contamination rate, start with the algorithm’s anomaly scores and set the threshold interactively by examining the highest-scoring points.

Common misconception

People assume anomaly detection is fully automated — deploy it and forget it. Real systems require constant tuning. What counts as “normal” drifts over time (concept drift). A system deployed in January may produce false positives by March because user behavior changed. Regular retraining and threshold adjustment are essential.

The one thing to remember: Anomaly detection algorithms model “normal” in different ways — statistical, density-based, isolation-based, or reconstruction-based — and the right choice depends on your data’s dimensionality, distribution, and whether anomalies are global or local.

pythondata-scienceanomaly-detectionmachine-learning

See Also

  • Anomaly Detection How AI spots the one thing that doesn't belong — the technique behind credit card fraud detection, medical diagnosis, and industrial quality control.
  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
  • Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
  • Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
  • Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.