Fraud Detection Patterns with Python — Core Concepts

Why fraud detection is a unique ML problem

Fraud detection differs from typical classification in three critical ways:

  1. Extreme class imbalance — legitimate transactions outnumber fraudulent ones by 500:1 or more. Standard accuracy metrics are meaningless (a model that labels everything “not fraud” achieves 99.8% accuracy).
  2. Adversarial environment — fraudsters adapt to detection systems. A model that works today may be useless in six months.
  3. Cost asymmetry — missing a $10,000 fraud costs far more than inconveniencing a customer with a false positive.

The two detection approaches

Rule-based systems

Hard-coded rules catch known patterns: “flag any transaction over $5,000 from a new device” or “block purchases from countries on the risk list within 24 hours of a password change.”

Rules are fast, interpretable, and easy to audit (regulators love them). But they cannot catch novel fraud patterns and create maintenance nightmares as the rule count grows into the thousands.

Machine learning models

ML models learn patterns from historical data and generalize to new fraud variants. They complement rules by catching the patterns humans have not codified yet.

Most production systems use both: rules for known, high-confidence patterns and ML for everything else.

Feature engineering — the real competitive advantage

The algorithm matters less than the features you feed it. Fraud-specific features fall into categories:

Transaction features: amount, merchant category, time of day, payment method.

Velocity features: how many transactions in the last hour, total spending in the last 24 hours, number of unique merchants in the last week.

Behavioral deviation: how far is this transaction from the user’s historical average amount, typical merchant category, usual location?

Network features: is this card linked to a device that was used with a previously flagged card? (Graph-based features are among the most powerful in fraud detection.)

import pandas as pd

def compute_velocity_features(df: pd.DataFrame, user_col: str = "user_id") -> pd.DataFrame:
    """Compute rolling transaction counts and amounts per user."""
    df = df.sort_values(["user_id", "timestamp"])
    
    # Transactions in the last hour (per user)
    df["txn_count_1h"] = df.groupby(user_col)["timestamp"].transform(
        lambda x: x.rolling("1H").count()
    )
    
    # Total amount in last 24 hours
    df["amount_sum_24h"] = df.groupby(user_col)["amount"].transform(
        lambda x: x.rolling("24H").sum()
    )
    
    # Deviation from user's average
    user_avg = df.groupby(user_col)["amount"].transform("mean")
    df["amount_deviation"] = (df["amount"] - user_avg) / user_avg.clip(lower=1)
    
    return df

Handling class imbalance

TechniqueHow it worksTradeoff
SMOTEGenerate synthetic minority samplesCan create unrealistic fraud patterns
UndersamplingRemove majority class samplesLoses legitimate transaction patterns
Class weightsPenalize misclassifying fraud more heavilySimple, no data manipulation needed
Anomaly detectionModel normal behavior, flag deviationsDoes not need labeled fraud examples

Class weighting is the simplest and often most effective starting point:

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(
    n_estimators=200,
    max_depth=6,
)
# Compute sample weights: fraud gets 100× weight
sample_weight = np.where(y_train == 1, 100, 1)
model.fit(X_train, y_train, sample_weight=sample_weight)

Evaluation metrics that matter

Forget accuracy. Use:

  • Precision at high recall: if you need to catch 95% of fraud (recall = 0.95), what fraction of flagged transactions are actually fraudulent?
  • Average Precision (PR-AUC): area under the precision-recall curve, robust to imbalance.
  • Value-weighted detection rate: weight each detection by the fraud amount; catching a $10,000 fraud matters more than catching a $5 one.

Common misconception

Many teams think they need sophisticated deep learning models for fraud detection. In practice, gradient boosting (XGBoost, LightGBM) with well-engineered features outperforms deep learning on tabular fraud data in most benchmarks. The Kaggle credit card fraud dataset, the IEEE-CIS fraud competition, and industry practitioners consistently confirm this. Invest in features, not exotic architectures.

The feedback loop

Fraud detection is not a one-time model deployment. It is a continuous cycle:

  1. Model scores transactions in real time.
  2. Analysts review flagged transactions and provide labels.
  3. Labels flow back into the training data.
  4. Model is retrained on updated data (typically weekly or monthly).
  5. Fraudsters adapt to the new model’s behavior.
  6. Repeat.

Without this loop, model performance degrades within months as fraud patterns evolve.

The one thing to remember: Effective fraud detection combines rule-based checks for known patterns with ML models fed by behavioral features — and treats the system as a living process that must continuously adapt to adversarial evolution.

pythonfinancefraud-detectionmachine-learning

See Also