Architecture Overview

MLOps Fraud Pipeline

Credit Card Fraud Detection · End-to-End · Nanthan Srikumar · 2026
Complete
In Progress
Next
Planned
1 Stage 1 · Complete
Kaggle Dataset — Credit Card Fraud
284,807 transactions · 492 fraud cases · 0.17% positive rate · V1–V28 PCA features + Amount. Downloaded via Kaggle CLI with API credentials stored as GitHub Secrets.
kaggle CLI pandas 284k rows 577:1 imbalance
StandardScaler on Amount · drop Time
2 Stage 1 · Complete
Model Training — XGBoost Classifier
scale_pos_weight=577 handles class imbalance natively — no SMOTE. Evaluated on PR-AUC (not ROC-AUC) because accuracy is misleading at 0.17% positive rate. ROC-AUC: 0.9747 · PR-AUC: 0.8510. Model saved to models/model.joblib.
XGBoost 2.0.3 scale_pos_weight=577 PR-AUC primary scikit-learn pipeline joblib
mlflow.xgboost.log_model()
3 Stage 3 · Complete
MLflow — Experiment Tracking + Model Registry
Logs params (n_estimators, max_depth, learning_rate, scale_pos_weight) and metrics (roc_auc, avg_precision) per run. Model artifacts stored in S3 — not local disk — so CI can fetch them. Registry holds versioned models; DS promotes best run to @champion alias. Backend store local for now (Stage 2b will move to EC2).
MLflow 3.10.1 S3 artifact store fraud-detector@champion localhost:5001 Docker Compose service
Artifact Storage s3://mlops-fraud-pipeline-artifacts-nanthan/mlflow-artifacts · us-east-2 · IAM scoped to GetObject, PutObject, ListBucket only
served via Docker Compose
4 Stage 1 · Complete
FastAPI Prediction Service — /predict
POST /predict accepts V1–V28 + Amount, returns fraud_probability + is_fraud boolean. Model loaded at startup via lifespan context manager. /health and /metrics endpoints exposed. Runs on port 8000. Prometheus instrumentator auto-exposes /metrics.
FastAPI Uvicorn Pydantic v2 prometheus-fastapi-instrumentator port 8000
docker compose up --build
5 Stage 1 · Complete
Docker — Container Build
python:3.11-slim base. App + model baked in at build time. Docker Compose orchestrates 4 services: app, mlflow, prometheus, grafana. Cross-platform — Mac and Windows identical.
Docker Docker Compose 4 services layer caching
6 Stage 2a · Complete
GitHub Actions CI — Test + Build
Two jobs: test (pytest 5/5) → build (kaggle download → train → docker build). OIDC auth — no stored AWS keys. GitHub issues short-lived JWT per job; AWS STS verifies and issues temp credentials. permissions: id-token: write required.
GitHub Actions OIDC / STS role-to-assume pytest 5/5
Auth Pattern No IAM access keys. GitHub OIDC → AWS STS → temp credentials per job. IAM role scoped to repo:nanthansr/mlops-fraud-pipeline:* only.
fetch fraud-detector@champion → docker build → push
scrape /metrics every 15s
9 Stage 4 · Infra Ready
Prometheus — Metrics Collection
Scrapes /metrics from FastAPI every 15s. Collects HTTP request counts, latency, error rates. Running via Docker Compose on port 9090. prometheus.yml targets app:8000.
Prometheus port 9090 15s scrape interval HTTP + model metrics
10 Stage 4 · Infra Ready
Grafana — Dashboard + Alerting
Visualises Prometheus metrics. Model health dashboard planned: fraud prediction rate over time, class distribution drift, anomaly alert thresholds. Running on port 3000. admin/admin local. Alerting rules trigger AIOps layer.
Grafana port 3000 Prometheus datasource model drift panel
drift detected → alert fired
11 Stage 5 · Planned
Evidently AI — Data Drift Detection
Monitors input feature distribution shift vs training baseline. Generates HTML reports or exports metrics to Prometheus. Replaces custom numpy drift code with a named, production-grade tool. One-weekend integration.
Evidently AI feature drift prediction drift Prometheus export
12 Stage 5 · Planned
AIOps Anomaly Detection + Alerting
Anomaly detection on prediction distributions and infra metrics. Grafana alerting rules trigger on fraud rate spikes or drift events. Alert routed to Slack or email. Incident simulation write-up documents the full detection-to-response loop.
Grafana alerting Slack / email incident simulation anomaly detection

Dataset

284,807 transactions492 fraud · 0.17% positive rate · 577:1 class imbalance

Model Performance

PR-AUC 0.8510ROC-AUC 0.9747 · XGBoost · scale_pos_weight=577

AWS Infrastructure

S3 + ECR + ECSOIDC auth · no stored keys · least-privilege IAM · us-east-2

Progress

Stages 1, 2a, 3 doneStage 2b next · 6 stages total · graduating April 2026