Skip to content

Instantly share code, notes, and snippets.

@thecoder8890
Last active July 14, 2025 07:05
Show Gist options
  • Save thecoder8890/f2890ee71202ced45a1085f5c529d350 to your computer and use it in GitHub Desktop.
Save thecoder8890/f2890ee71202ced45a1085f5c529d350 to your computer and use it in GitHub Desktop.
For a fraud detection model, would F1-score be a better choice than accuracy?

βœ… Why F1-score Is Better:

The F1-score is the harmonic mean of precision and recall, and gives a more balanced picture:

$$ F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision + Recall}} $$

Where:

  • Precision = TP / (TP + FP)
  • Recall = TP / (TP + FN)

Example: https://www.kaggle.com/code/mayuringle8890/fraud-detection-notebook/

πŸ“Œ In fraud detection:

  • We want high recall β†’ catch as many frauds as possible (reduce false negatives)
  • We also want high precision β†’ avoid too many false alarms (reduce false positives)

βš–οΈ F1-Score Trade-off:

  • A high F1-score means your model is doing well on both fronts.

  • You may even consider using FΞ²-score to prioritize either recall or precision depending on your business need:

    • F2-score if catching more fraud is more important than false positives.
    • F0.5-score if false positives are costlier than missing some fraud.

βœ… Why Accuracy Fails in Imbalanced Problems:

If 99.9% of transactions are legitimate and the model always predicts "Not Fraud", then:

  • Accuracy = 99.9%,
  • True Positives (fraud detected) = 0
  • False Negatives (fraud missed) = All actual frauds

πŸ‘‰ So the model looks good on paper (99.9% accurate) but is completely useless in practice β€” it catches no fraud.


πŸ” In Summary:

Metric Good for imbalanced data? What it tells you
Accuracy ❌ No Can be misleading if classes are imbalanced
Precision βœ… Yes How many predicted frauds were actually frauds
Recall βœ… Yes How many actual frauds you successfully detected
F1-score βœ…βœ… Best choice Balances precision and recall

For fraud detection, which is a highly imbalanced binary classification problem, you need a model that:

  1. Handles class imbalance well.
  2. Can capture complex patterns.
  3. Can be tuned for precision-recall trade-offs.

Here are some recommended models, categorized by complexity:


πŸ”Ή 1. Baseline Models (Start Simple)

Model Notes
Logistic Regression Simple, interpretable, good baseline. Add class weights or use SMOTE.
Decision Tree Captures non-linear patterns, but can overfit.

Use with:

  • class_weight='balanced'
  • feature scaling (for LR)

πŸ”Ή 2. Ensemble Models (Often Best for Tabular Fraud Data)

Model Why it's good for fraud detection
Random Forest Robust, handles imbalance with class weights.
XGBoost Handles imbalance via scale_pos_weight, high performance.
LightGBM Fast, efficient, supports is_unbalance=True flag.
CatBoost Works well with categorical features and imbalance.

βœ… These are often top performers in Kaggle competitions and real-world systems.


πŸ”Ή 3. Anomaly Detection Models (When labels are rare or noisy)

If you have very few fraud samples, try:

Model Notes
Isolation Forest Unsupervised, good for detecting rare patterns
One-Class SVM Works when you only have "normal" data to learn from
Autoencoders (Deep Learning) Learn normal patterns, flag large reconstruction errors as frauds

πŸ” Use these when you don’t have labels for frauds or they are very sparse.


πŸ§ͺ Experimental / Advanced

Model Notes
Graph Neural Networks If fraud involves networks (users, devices, accounts)
Hybrid Models (Ensemble + Deep Learning) Combine decision trees and autoencoders

βœ… Best Practice for Fraud Detection

  • Resampling: SMOTE, ADASYN, or undersample the majority class.
  • Evaluation: Use F1-score, Precision-Recall AUC, not accuracy.
  • Threshold tuning: You can tune the classification threshold to optimize F1 or minimize business cost.
  • Explainability: Use SHAP/LIME for model interpretability, especially important in finance.

πŸ“Œ Recommendation for You (If You’re Just Starting):

# Try this pipeline:
- Preprocessing: Scale features + encode categoricals
- Use: LightGBM or XGBoost
- Set: scale_pos_weight = (legit / fraud) ratio
- Evaluate: precision, recall, F1, PR-AUC

Example: https://www.kaggle.com/code/mayuringle8890/fraud-detection-notebook/

πŸ“Œ Final Note:

In a real fraud detection system, you might also:

  • Use Precision-Recall curves
  • Optimize based on business cost (e.g., cost of a false positive vs false negative)
  • Use confusion matrix to interpret model performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment