Fraud Detection Patterns with Python — Core Concepts
Why fraud detection is a unique ML problem
Fraud detection differs from typical classification in three critical ways:
- Extreme class imbalance — legitimate transactions outnumber fraudulent ones by 500:1 or more. Standard accuracy metrics are meaningless (a model that labels everything “not fraud” achieves 99.8% accuracy).
- Adversarial environment — fraudsters adapt to detection systems. A model that works today may be useless in six months.
- Cost asymmetry — missing a $10,000 fraud costs far more than inconveniencing a customer with a false positive.
The two detection approaches
Rule-based systems
Hard-coded rules catch known patterns: “flag any transaction over $5,000 from a new device” or “block purchases from countries on the risk list within 24 hours of a password change.”
Rules are fast, interpretable, and easy to audit (regulators love them). But they cannot catch novel fraud patterns and create maintenance nightmares as the rule count grows into the thousands.
Machine learning models
ML models learn patterns from historical data and generalize to new fraud variants. They complement rules by catching the patterns humans have not codified yet.
Most production systems use both: rules for known, high-confidence patterns and ML for everything else.
Feature engineering — the real competitive advantage
The algorithm matters less than the features you feed it. Fraud-specific features fall into categories:
Transaction features: amount, merchant category, time of day, payment method.
Velocity features: how many transactions in the last hour, total spending in the last 24 hours, number of unique merchants in the last week.
Behavioral deviation: how far is this transaction from the user’s historical average amount, typical merchant category, usual location?
Network features: is this card linked to a device that was used with a previously flagged card? (Graph-based features are among the most powerful in fraud detection.)
import pandas as pd
def compute_velocity_features(df: pd.DataFrame, user_col: str = "user_id") -> pd.DataFrame:
"""Compute rolling transaction counts and amounts per user."""
df = df.sort_values(["user_id", "timestamp"])
# Transactions in the last hour (per user)
df["txn_count_1h"] = df.groupby(user_col)["timestamp"].transform(
lambda x: x.rolling("1H").count()
)
# Total amount in last 24 hours
df["amount_sum_24h"] = df.groupby(user_col)["amount"].transform(
lambda x: x.rolling("24H").sum()
)
# Deviation from user's average
user_avg = df.groupby(user_col)["amount"].transform("mean")
df["amount_deviation"] = (df["amount"] - user_avg) / user_avg.clip(lower=1)
return df
Handling class imbalance
| Technique | How it works | Tradeoff |
|---|---|---|
| SMOTE | Generate synthetic minority samples | Can create unrealistic fraud patterns |
| Undersampling | Remove majority class samples | Loses legitimate transaction patterns |
| Class weights | Penalize misclassifying fraud more heavily | Simple, no data manipulation needed |
| Anomaly detection | Model normal behavior, flag deviations | Does not need labeled fraud examples |
Class weighting is the simplest and often most effective starting point:
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(
n_estimators=200,
max_depth=6,
)
# Compute sample weights: fraud gets 100× weight
sample_weight = np.where(y_train == 1, 100, 1)
model.fit(X_train, y_train, sample_weight=sample_weight)
Evaluation metrics that matter
Forget accuracy. Use:
- Precision at high recall: if you need to catch 95% of fraud (recall = 0.95), what fraction of flagged transactions are actually fraudulent?
- Average Precision (PR-AUC): area under the precision-recall curve, robust to imbalance.
- Value-weighted detection rate: weight each detection by the fraud amount; catching a $10,000 fraud matters more than catching a $5 one.
Common misconception
Many teams think they need sophisticated deep learning models for fraud detection. In practice, gradient boosting (XGBoost, LightGBM) with well-engineered features outperforms deep learning on tabular fraud data in most benchmarks. The Kaggle credit card fraud dataset, the IEEE-CIS fraud competition, and industry practitioners consistently confirm this. Invest in features, not exotic architectures.
The feedback loop
Fraud detection is not a one-time model deployment. It is a continuous cycle:
- Model scores transactions in real time.
- Analysts review flagged transactions and provide labels.
- Labels flow back into the training data.
- Model is retrained on updated data (typically weekly or monthly).
- Fraudsters adapt to the new model’s behavior.
- Repeat.
Without this loop, model performance degrades within months as fraud patterns evolve.
The one thing to remember: Effective fraud detection combines rule-based checks for known patterns with ML models fed by behavioral features — and treats the system as a living process that must continuously adapt to adversarial evolution.
See Also
- Python Backtesting Trading Strategies Why traders use Python to test their ideas on old data before risking real money, in plain language.
- Python Portfolio Optimization How Python helps you pick the right mix of investments so you get the best return for the risk you are willing to take.
- Python Quantitative Finance How Python helps people use math and data to make smarter money decisions, explained without any jargon.
- Python Risk Analysis Monte Carlo How rolling a virtual dice thousands of times helps investors understand what could go wrong with their money.
- Python Technical Indicators What technical indicators are and how Python calculates them, explained like you have never seen a stock chart.