MyNameIsTatiBond commited on
Commit
25a706b
Β·
1 Parent(s): 1cdb0eb

Upload complete fraud API project

Browse files
Files changed (8) hide show
  1. DEPLOYMENT.md +179 -0
  2. Dockerfile +22 -0
  3. README.md +233 -12
  4. app.py +201 -0
  5. example_claim.json +13 -0
  6. index.html +417 -0
  7. requirements.txt +8 -0
  8. test_api.sh +56 -0
DEPLOYMENT.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide - HuggingFace Spaces
2
+
3
+ ## Prerequisites
4
+
5
+ - HuggingFace account
6
+ - Git installed locally
7
+ - Trained model files
8
+
9
+ ## Step-by-Step Deployment
10
+
11
+ ### 1. Create a New Space
12
+
13
+ 1. Go to https://huggingface.co/new-space
14
+ 2. Choose a name for your space (e.g., `fraud-detection-api`)
15
+ 3. Select **Docker** as the SDK
16
+ 4. Choose visibility (Public or Private)
17
+ 5. Click "Create Space"
18
+
19
+ ### 2. Initialize Git Repository
20
+
21
+ ```bash
22
+ cd fraud_api
23
+ git init
24
+ git add .
25
+ git commit -m "Initial commit: Fraud Detection API"
26
+ ```
27
+
28
+ ### 3. Add HuggingFace Remote
29
+
30
+ ```bash
31
+ # Replace YOUR_USERNAME and YOUR_SPACE with your details
32
+ git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
33
+ ```
34
+
35
+ ### 4. Push to HuggingFace
36
+
37
+ ```bash
38
+ git push -u origin main
39
+ ```
40
+
41
+ **Note:** You may be prompted for credentials:
42
+ - Username: Your HuggingFace username
43
+ - Password: Use a **HuggingFace Access Token** (not your password)
44
+ - Get token from: https://huggingface.co/settings/tokens
45
+
46
+ ### 5. Monitor Build
47
+
48
+ 1. Go to your Space URL: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE`
49
+ 2. Click on "Logs" tab to monitor the Docker build
50
+ 3. Build typically takes 3-5 minutes
51
+
52
+ ### 6. Access Your API
53
+
54
+ Once deployed, your API will be available at:
55
+ ```
56
+ https://YOUR_USERNAME-YOUR_SPACE.hf.space
57
+ ```
58
+
59
+ Test it:
60
+ ```bash
61
+ curl https://YOUR_USERNAME-YOUR_SPACE.hf.space/health
62
+ ```
63
+
64
+ ## Troubleshooting
65
+
66
+ ### Build Fails - Missing Models
67
+
68
+ **Problem:** Models not found in `models/` directory
69
+
70
+ **Solution:**
71
+ 1. Ensure model files are committed to git
72
+ 2. Check `.gitignore` doesn't exclude `.joblib` files
73
+ 3. Verify models are in correct location
74
+
75
+ ### Out of Memory Error
76
+
77
+ **Problem:** Docker container runs out of memory
78
+
79
+ **Solution:**
80
+ 1. Reduce model size (use only necessary models)
81
+ 2. Implement lazy loading
82
+ 3. Request more resources from HuggingFace
83
+
84
+ ### Port Issues
85
+
86
+ **Problem:** Application not accessible
87
+
88
+ **Solution:**
89
+ Ensure Dockerfile uses port 7860 (HuggingFace standard):
90
+ ```dockerfile
91
+ EXPOSE 7860
92
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
93
+ ```
94
+
95
+ ## Updating Your Deployment
96
+
97
+ When you make changes:
98
+
99
+ ```bash
100
+ git add .
101
+ git commit -m "Update: description of changes"
102
+ git push origin main
103
+ ```
104
+
105
+ HuggingFace will automatically rebuild and redeploy.
106
+
107
+ ## Advanced Configuration
108
+
109
+ ### Environment Variables
110
+
111
+ Add secrets via HuggingFace Space settings:
112
+ 1. Go to Space Settings β†’ Repository secrets
113
+ 2. Add key-value pairs
114
+ 3. Access in `app.py`:
115
+
116
+ ```python
117
+ import os
118
+ SECRET_KEY = os.getenv("SECRET_KEY")
119
+ ```
120
+
121
+ ### Custom Domain
122
+
123
+ For production, consider:
124
+ 1. Upgrading to HuggingFace Pro
125
+ 2. Setting up custom domain
126
+ 3. Adding CDN/caching layer
127
+
128
+ ## Monitoring
129
+
130
+ ### Check Logs
131
+
132
+ ```bash
133
+ # View real-time logs in HuggingFace UI
134
+ # Or use API:
135
+ curl https://huggingface.co/api/spaces/YOUR_USERNAME/YOUR_SPACE/logs
136
+ ```
137
+
138
+ ### Usage Analytics
139
+
140
+ HuggingFace provides basic analytics:
141
+ - Request count
142
+ - Response times
143
+ - Error rates
144
+
145
+ Access from Space settings dashboard.
146
+
147
+ ## Cost Considerations
148
+
149
+ **Free Tier:**
150
+ - Limited CPU/RAM
151
+ - May sleep after inactivity
152
+ - Suitable for demos/testing
153
+
154
+ **Paid Options:**
155
+ - Persistent compute
156
+ - GPU access
157
+ - Higher resource limits
158
+ - Custom containers
159
+
160
+ ## Security Checklist
161
+
162
+ Before going to production:
163
+
164
+ - [ ] Add authentication
165
+ - [ ] Implement rate limiting
166
+ - [ ] Set up CORS properly
167
+ - [ ] Use HTTPS only
168
+ - [ ] Monitor for abuse
169
+ - [ ] Set resource limits
170
+ - [ ] Add input validation
171
+ - [ ] Implement logging
172
+ - [ ] Regular security updates
173
+ - [ ] Model versioning strategy
174
+
175
+ ## Support
176
+
177
+ - Documentation: https://huggingface.co/docs/hub/spaces-overview
178
+ - Community: https://discuss.huggingface.co
179
+ - Issues: https://github.com/huggingface/hub-docs/issues
Dockerfile ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /code
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ gcc \
8
+ g++ \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Copy requirements and install Python dependencies
12
+ COPY requirements.txt .
13
+ RUN pip install --no-cache-dir -r requirements.txt
14
+
15
+ # Copy application code
16
+ COPY . .
17
+
18
+ # Expose port for HuggingFace Spaces
19
+ EXPOSE 7860
20
+
21
+ # Run the application
22
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,12 +1,233 @@
1
- ---
2
- title: Fraud Detector
3
- emoji: πŸ“Š
4
- colorFrom: blue
5
- colorTo: red
6
- sdk: docker
7
- pinned: false
8
- license: other
9
- short_description: 'Insurance Fraud Detection '
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fraud Detection API
2
+
3
+ Production-ready inference API for insurance fraud detection using pre-trained ML models.
4
+
5
+ ## πŸš€ Quick Start
6
+
7
+ ### Local Development
8
+
9
+ 1. **Copy your trained models:**
10
+ ```bash
11
+ cp ../models/best_tree_models_calibrated.joblib models/
12
+ cp ../models/best_tree_models_uncalibrated.joblib models/
13
+ ```
14
+
15
+ 2. **Install dependencies:**
16
+ ```bash
17
+ pip install -r requirements.txt
18
+ ```
19
+
20
+ 3. **Run the server:**
21
+ ```bash
22
+ uvicorn app:app --reload --port 7860
23
+ ```
24
+
25
+ 4. **Open the UI:**
26
+ Visit `http://localhost:7860` in your browser
27
+
28
+ ### Example API Request
29
+
30
+ ```bash
31
+ curl -X POST "http://localhost:7860/predict?model=xgb&scenario=dashboard" \
32
+ -H "Content-Type: application/json" \
33
+ -d '{
34
+ "policy_annual_premium": 1200.0,
35
+ "total_claim_amount": 15000.0,
36
+ "vehicle_age": 5,
37
+ "days_since_bind": 300,
38
+ "months_as_customer": 24,
39
+ "capital-gains": 0,
40
+ "capital-loss": 0,
41
+ "injury_share": 0.4,
42
+ "property_share": 0.6,
43
+ "umbrella_limit": 0,
44
+ "incident_hour_of_the_day": 14
45
+ }'
46
+ ```
47
+
48
+ ### Example Response
49
+
50
+ ```json
51
+ {
52
+ "model": "XGBoost",
53
+ "calibrated": true,
54
+ "probability": 0.73,
55
+ "threshold_flag": null,
56
+ "scenario": "dashboard"
57
+ }
58
+ ```
59
+
60
+ ## πŸ“‹ API Reference
61
+
62
+ ### Endpoints
63
+
64
+ #### `POST /predict`
65
+
66
+ Make a fraud prediction for an insurance claim.
67
+
68
+ **Query Parameters:**
69
+ - `model` (string): Model type - `rf` (RandomForest), `et` (ExtraTrees), or `xgb` (XGBoost)
70
+ - `scenario` (string): `dashboard` (calibrated) or `auto_flagger` (uncalibrated + threshold)
71
+ - `calibrated` (boolean): Override calibration (optional, scenario takes precedence)
72
+
73
+ **Request Body:**
74
+ ```json
75
+ {
76
+ "policy_annual_premium": float,
77
+ "total_claim_amount": float,
78
+ "vehicle_age": int,
79
+ "days_since_bind": int,
80
+ "months_as_customer": int,
81
+ "capital-gains": float,
82
+ "capital-loss": float,
83
+ "injury_share": float,
84
+ "property_share": float,
85
+ "umbrella_limit": int,
86
+ "incident_hour_of_the_day": int (0-23)
87
+ }
88
+ ```
89
+
90
+ #### `GET /health`
91
+
92
+ Health check endpoint returning model status.
93
+
94
+ ## 🎯 Deployment Scenarios
95
+
96
+ ### Scenario A: Auto-Flagger
97
+ **Use Case:** Automated claim flagging system
98
+
99
+ - Uses **uncalibrated** models for maximum recall
100
+ - Returns decision flag: `AUTO_FLAG` or `AUTO_APPROVE`
101
+ - Threshold: 0.53 (adjust based on your F2 optimization)
102
+
103
+ ```bash
104
+ curl -X POST "http://localhost:7860/predict?model=xgb&scenario=auto_flagger" \
105
+ -H "Content-Type: application/json" \
106
+ -d @claim_data.json
107
+ ```
108
+
109
+ ### Scenario B: Investigator Dashboard
110
+ **Use Case:** Human-in-the-loop prioritization
111
+
112
+ - Uses **calibrated** models for accurate probabilities
113
+ - Returns probability score for ranking claims
114
+ - No hard threshold decision
115
+
116
+ ```bash
117
+ curl -X POST "http://localhost:7860/predict?model=xgb&scenario=dashboard" \
118
+ -H "Content-Type: application/json" \
119
+ -d @claim_data.json
120
+ ```
121
+
122
+ ## 🐳 Docker Deployment
123
+
124
+ ### Build and Run Locally
125
+
126
+ ```bash
127
+ docker build -t fraud-api .
128
+ docker run -p 7860:7860 fraud-api
129
+ ```
130
+
131
+ ### Deploy to HuggingFace Spaces
132
+
133
+ 1. Create a new Space on HuggingFace
134
+ 2. Select **Docker** as SDK
135
+ 3. Push this folder to your Space repository:
136
+
137
+ ```bash
138
+ git init
139
+ git add .
140
+ git commit -m "Initial commit"
141
+ git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
142
+ git push -u origin main
143
+ ```
144
+
145
+ 4. HuggingFace will automatically build and deploy your Docker container
146
+ 5. Your API will be available at: `https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space`
147
+
148
+ ## πŸ“ Project Structure
149
+
150
+ ```
151
+ fraud_api/
152
+ β”œβ”€β”€ app.py # FastAPI backend
153
+ β”œβ”€β”€ index.html # Web UI
154
+ β”œβ”€β”€ requirements.txt # Python dependencies
155
+ β”œβ”€β”€ Dockerfile # Container configuration
156
+ β”œβ”€β”€ README.md # This file
157
+ └── models/ # Model files (add your .joblib files here)
158
+ β”œβ”€β”€ best_tree_models_calibrated.joblib
159
+ └── best_tree_models_uncalibrated.joblib
160
+ ```
161
+
162
+ ## βš™οΈ Configuration
163
+
164
+ ### Adjust Auto-Flag Threshold
165
+
166
+ Edit `app.py` line 19:
167
+ ```python
168
+ THRESHOLD_AUTO_FLAG = 0.53 # Adjust based on your requirements
169
+ ```
170
+
171
+ ### Model Loading
172
+
173
+ Models are loaded on startup from `models/` directory. Expected format:
174
+ ```python
175
+ {
176
+ 'Trees': {
177
+ 'RandomForest': <model_pipeline>,
178
+ 'ExtraTrees': <model_pipeline>,
179
+ 'XGBoost': <model_pipeline>
180
+ }
181
+ }
182
+ ```
183
+
184
+ ## πŸ› οΈ Testing
185
+
186
+ Test the API with sample data:
187
+
188
+ ```bash
189
+ # High-risk claim
190
+ curl -X POST "http://localhost:7860/predict?model=xgb&scenario=auto_flagger" \
191
+ -H "Content-Type: application/json" \
192
+ -d '{
193
+ "policy_annual_premium": 500,
194
+ "total_claim_amount": 50000,
195
+ "vehicle_age": 1,
196
+ "days_since_bind": 10,
197
+ "months_as_customer": 2,
198
+ "capital-gains": 10000,
199
+ "capital-loss": 0,
200
+ "injury_share": 0.8,
201
+ "property_share": 0.2,
202
+ "umbrella_limit": 0,
203
+ "incident_hour_of_the_day": 3
204
+ }'
205
+ ```
206
+
207
+ ## πŸ“Š Model Information
208
+
209
+ This API serves predictions from models trained on insurance claim data with F2-score optimization for fraud detection. The models were calibrated using Platt scaling to ensure probability quality.
210
+
211
+ **Available Models:**
212
+ - **RandomForest**: Ensemble of decision trees
213
+ - **ExtraTrees**: Extra randomized trees
214
+ - **XGBoost**: Gradient boosted decision trees
215
+
216
+ **Calibration:**
217
+ - Uncalibrated: Optimized for maximum recall (catching fraud)
218
+ - Calibrated: Optimized for probability accuracy (ranking)
219
+
220
+ ## πŸ”’ Security Notes
221
+
222
+ - This is a minimal inference API for demonstration
223
+ - For production deployment, add:
224
+ - Authentication (API keys, OAuth)
225
+ - Rate limiting
226
+ - Input sanitization
227
+ - HTTPS/TLS
228
+ - Monitoring and logging
229
+ - Model versioning
230
+
231
+ ## πŸ“ License
232
+
233
+ MIT License - See project root for details
app.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Fraud Detection API - FastAPI Backend
3
+ Serves predictions from pre-trained ML models (RandomForest, ExtraTrees, XGBoost)
4
+ Supports both calibrated and uncalibrated versions with two deployment scenarios.
5
+ """
6
+
7
+ from fastapi import FastAPI, HTTPException, Query
8
+ from fastapi.staticfiles import StaticFiles
9
+ from fastapi.responses import FileResponse
10
+ from pydantic import BaseModel, Field
11
+ from typing import Optional, Literal
12
+ import joblib
13
+ import numpy as np
14
+ from pathlib import Path
15
+ import logging
16
+
17
+ # Configure logging
18
+ logging.basicConfig(level=logging.INFO)
19
+ logger = logging.getLogger(__name__)
20
+
21
+ # Initialize FastAPI app
22
+ app = FastAPI(title="Fraud Detection API", version="1.0.0")
23
+
24
+ # Model configuration
25
+ MODELS_DIR = Path("models")
26
+ THRESHOLD_AUTO_FLAG = 0.53 # Placeholder - adjust based on your F2 optimization
27
+
28
+ # Model registry
29
+ MODELS = {}
30
+
31
+ class ClaimInput(BaseModel):
32
+ """Input schema for claim predictions"""
33
+ policy_annual_premium: float = Field(..., description="Annual policy premium")
34
+ total_claim_amount: float = Field(..., description="Total claim amount")
35
+ vehicle_age: int = Field(..., description="Age of vehicle in years")
36
+ days_since_bind: int = Field(..., description="Days since policy binding")
37
+ months_as_customer: int = Field(..., description="Months as customer")
38
+ capital_gains: float = Field(0.0, alias="capital-gains")
39
+ capital_loss: float = Field(0.0, alias="capital-loss")
40
+ injury_share: float = Field(..., description="Share of injury damage")
41
+ property_share: float = Field(..., description="Share of property damage")
42
+ umbrella_limit: int = Field(..., description="Umbrella policy limit")
43
+ incident_hour_of_the_day: int = Field(..., ge=0, le=23)
44
+ hour_sin: Optional[float] = None
45
+ hour_cos: Optional[float] = None
46
+
47
+ class Config:
48
+ populate_by_name = True
49
+
50
+ class PredictionResponse(BaseModel):
51
+ """Response schema for predictions"""
52
+ model: str
53
+ calibrated: bool
54
+ probability: float
55
+ threshold_flag: Optional[str] = None
56
+ scenario: str
57
+
58
+ def load_models():
59
+ """Load all available models on startup"""
60
+ model_types = ["RandomForest", "ExtraTrees", "XGBoost"]
61
+ calibration_types = ["calibrated", "uncalibrated"]
62
+
63
+ for model_type in model_types:
64
+ for cal_type in calibration_types:
65
+ # Expected filename format: best_tree_models_calibrated.joblib or best_tree_models_uncalibrated.joblib
66
+ filename = f"best_tree_models_{cal_type}.joblib"
67
+ filepath = MODELS_DIR / filename
68
+
69
+ if filepath.exists():
70
+ try:
71
+ models_dict = joblib.load(filepath)
72
+ # Models are stored in dict structure: {'Trees': {'RandomForest': model, 'XGBoost': model, ...}}
73
+ if 'Trees' in models_dict and model_type in models_dict['Trees']:
74
+ key = f"{model_type}_{cal_type}"
75
+ MODELS[key] = models_dict['Trees'][model_type]
76
+ logger.info(f"Loaded model: {key}")
77
+ except Exception as e:
78
+ logger.error(f"Error loading {filepath}: {e}")
79
+
80
+ logger.info(f"Total models loaded: {len(MODELS)}")
81
+ if not MODELS:
82
+ logger.warning("No models loaded! Check models directory.")
83
+
84
+ @app.on_event("startup")
85
+ async def startup_event():
86
+ """Load models on application startup"""
87
+ load_models()
88
+
89
+ @app.get("/")
90
+ async def root():
91
+ """Serve the frontend HTML"""
92
+ return FileResponse("index.html")
93
+
94
+ @app.get("/health")
95
+ async def health_check():
96
+ """Health check endpoint"""
97
+ return {
98
+ "status": "healthy",
99
+ "models_loaded": len(MODELS),
100
+ "available_models": list(MODELS.keys())
101
+ }
102
+
103
+ @app.post("/predict", response_model=PredictionResponse)
104
+ async def predict(
105
+ claim_data: ClaimInput,
106
+ model: Literal["rf", "et", "xgb"] = Query("rf", description="Model type: rf=RandomForest, et=ExtraTrees, xgb=XGBoost"),
107
+ calibrated: bool = Query(True, description="Use calibrated model"),
108
+ scenario: Literal["auto_flagger", "dashboard"] = Query("dashboard", description="Prediction scenario")
109
+ ):
110
+ """
111
+ Predict fraud probability for an insurance claim.
112
+
113
+ - **Scenario A (auto_flagger)**: Uses uncalibrated model + threshold for auto-flagging
114
+ - **Scenario B (dashboard)**: Uses calibrated model for ranking/prioritization
115
+ """
116
+
117
+ # Map shorthand to full model names
118
+ model_map = {"rf": "RandomForest", "et": "ExtraTrees", "xgb": "XGBoost"}
119
+ model_name = model_map[model]
120
+
121
+ # Determine calibration type
122
+ cal_type = "calibrated" if calibrated else "uncalibrated"
123
+ model_key = f"{model_name}_{cal_type}"
124
+
125
+ # Override calibration based on scenario
126
+ if scenario == "auto_flagger":
127
+ cal_type = "uncalibrated"
128
+ model_key = f"{model_name}_uncalibrated"
129
+ elif scenario == "dashboard":
130
+ cal_type = "calibrated"
131
+ model_key = f"{model_name}_calibrated"
132
+
133
+ # Get model
134
+ if model_key not in MODELS:
135
+ raise HTTPException(
136
+ status_code=404,
137
+ detail=f"Model {model_key} not found. Available: {list(MODELS.keys())}"
138
+ )
139
+
140
+ loaded_model = MODELS[model_key]
141
+
142
+ # Prepare input data
143
+ # Calculate hour_sin and hour_cos if not provided
144
+ if claim_data.hour_sin is None or claim_data.hour_cos is None:
145
+ hour_rad = (claim_data.incident_hour_of_the_day / 24) * 2 * np.pi
146
+ claim_data.hour_sin = np.sin(hour_rad)
147
+ claim_data.hour_cos = np.cos(hour_rad)
148
+
149
+ # Convert to dict and create feature array
150
+ # Note: The model expects the preprocessor to handle feature engineering
151
+ # We'll pass raw features as a dict
152
+ features_dict = claim_data.dict(by_alias=True)
153
+
154
+ # For deployment, you would typically have a preprocessor that was saved with the model
155
+ # Here we assume the model is already wrapped in a pipeline that handles preprocessing
156
+ try:
157
+ # Create input array - order must match training
158
+ # The pipeline should handle the transformation
159
+ input_data = {
160
+ 'policy_annual_premium': features_dict['policy_annual_premium'],
161
+ 'total_claim_amount': features_dict['total_claim_amount'],
162
+ 'vehicle_age': features_dict['vehicle_age'],
163
+ 'days_since_bind': features_dict['days_since_bind'],
164
+ 'months_as_customer': features_dict['months_as_customer'],
165
+ 'capital-gains': features_dict['capital-gains'],
166
+ 'capital-loss': features_dict['capital-loss'],
167
+ 'injury_share': features_dict['injury_share'],
168
+ 'property_share': features_dict['property_share'],
169
+ 'umbrella_limit': features_dict['umbrella_limit'],
170
+ 'incident_hour_of_the_day': features_dict['incident_hour_of_the_day'],
171
+ 'hour_sin': features_dict['hour_sin'],
172
+ 'hour_cos': features_dict['hour_cos']
173
+ }
174
+
175
+ # If model is a pipeline, it expects a DataFrame
176
+ import pandas as pd
177
+ input_df = pd.DataFrame([input_data])
178
+
179
+ # Get prediction probability
180
+ proba = loaded_model.predict_proba(input_df)[0, 1] # Probability of fraud (class 1)
181
+
182
+ except Exception as e:
183
+ logger.error(f"Prediction error: {e}")
184
+ raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")
185
+
186
+ # Determine threshold flag for auto_flagger scenario
187
+ threshold_flag = None
188
+ if scenario == "auto_flagger":
189
+ threshold_flag = "AUTO_FLAG" if proba >= THRESHOLD_AUTO_FLAG else "AUTO_APPROVE"
190
+
191
+ return PredictionResponse(
192
+ model=model_name,
193
+ calibrated=(cal_type == "calibrated"),
194
+ probability=float(proba),
195
+ threshold_flag=threshold_flag,
196
+ scenario=scenario
197
+ )
198
+
199
+ if __name__ == "__main__":
200
+ import uvicorn
201
+ uvicorn.run(app, host="0.0.0.0", port=7860)
example_claim.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "policy_annual_premium": 1200.0,
3
+ "total_claim_amount": 15000.0,
4
+ "vehicle_age": 5,
5
+ "days_since_bind": 300,
6
+ "months_as_customer": 24,
7
+ "capital-gains": 0,
8
+ "capital-loss": 0,
9
+ "injury_share": 0.4,
10
+ "property_share": 0.6,
11
+ "umbrella_limit": 0,
12
+ "incident_hour_of_the_day": 14
13
+ }
index.html ADDED
@@ -0,0 +1,417 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+
4
+ <head>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>Fraud Detection API - Client</title>
8
+ <style>
9
+ * {
10
+ margin: 0;
11
+ padding: 0;
12
+ box-sizing: border-box;
13
+ }
14
+
15
+ body {
16
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
17
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
18
+ min-height: 100vh;
19
+ padding: 20px;
20
+ display: flex;
21
+ justify-content: center;
22
+ align-items: center;
23
+ }
24
+
25
+ .container {
26
+ background: white;
27
+ border-radius: 16px;
28
+ box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
29
+ max-width: 900px;
30
+ width: 100%;
31
+ padding: 40px;
32
+ }
33
+
34
+ h1 {
35
+ color: #333;
36
+ margin-bottom: 10px;
37
+ font-size: 28px;
38
+ }
39
+
40
+ .subtitle {
41
+ color: #666;
42
+ margin-bottom: 30px;
43
+ font-size: 14px;
44
+ }
45
+
46
+ .config-section {
47
+ background: #f8f9fa;
48
+ padding: 20px;
49
+ border-radius: 8px;
50
+ margin-bottom: 30px;
51
+ }
52
+
53
+ .config-title {
54
+ font-weight: 600;
55
+ color: #495057;
56
+ margin-bottom: 15px;
57
+ font-size: 16px;
58
+ }
59
+
60
+ .config-grid {
61
+ display: grid;
62
+ grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
63
+ gap: 15px;
64
+ }
65
+
66
+ .form-group {
67
+ margin-bottom: 20px;
68
+ }
69
+
70
+ label {
71
+ display: block;
72
+ font-weight: 500;
73
+ margin-bottom: 5px;
74
+ color: #495057;
75
+ font-size: 14px;
76
+ }
77
+
78
+ input,
79
+ select {
80
+ width: 100%;
81
+ padding: 12px;
82
+ border: 2px solid #e9ecef;
83
+ border-radius: 6px;
84
+ font-size: 14px;
85
+ transition: border-color 0.3s;
86
+ }
87
+
88
+ input:focus,
89
+ select:focus {
90
+ outline: none;
91
+ border-color: #667eea;
92
+ }
93
+
94
+ .input-grid {
95
+ display: grid;
96
+ grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
97
+ gap: 15px;
98
+ }
99
+
100
+ .predict-btn {
101
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
102
+ color: white;
103
+ border: none;
104
+ padding: 15px 40px;
105
+ border-radius: 8px;
106
+ font-size: 16px;
107
+ font-weight: 600;
108
+ cursor: pointer;
109
+ width: 100%;
110
+ margin-top: 20px;
111
+ transition: transform 0.2s, box-shadow 0.2s;
112
+ }
113
+
114
+ .predict-btn:hover {
115
+ transform: translateY(-2px);
116
+ box-shadow: 0 10px 25px rgba(102, 126, 234, 0.4);
117
+ }
118
+
119
+ .predict-btn:disabled {
120
+ opacity: 0.6;
121
+ cursor: not-allowed;
122
+ transform: none;
123
+ }
124
+
125
+ .result-section {
126
+ margin-top: 30px;
127
+ padding: 25px;
128
+ border-radius: 8px;
129
+ display: none;
130
+ }
131
+
132
+ .result-section.show {
133
+ display: block;
134
+ }
135
+
136
+ .result-section.fraud {
137
+ background: #fff5f5;
138
+ border: 2px solid #fc8181;
139
+ }
140
+
141
+ .result-section.legit {
142
+ background: #f0fff4;
143
+ border: 2px solid #68d391;
144
+ }
145
+
146
+ .result-title {
147
+ font-size: 20px;
148
+ font-weight: 600;
149
+ margin-bottom: 15px;
150
+ }
151
+
152
+ .result-grid {
153
+ display: grid;
154
+ grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
155
+ gap: 15px;
156
+ }
157
+
158
+ .result-item {
159
+ padding: 12px;
160
+ background: white;
161
+ border-radius: 6px;
162
+ }
163
+
164
+ .result-label {
165
+ font-size: 12px;
166
+ color: #718096;
167
+ text-transform: uppercase;
168
+ letter-spacing: 0.5px;
169
+ margin-bottom: 5px;
170
+ }
171
+
172
+ .result-value {
173
+ font-size: 18px;
174
+ font-weight: 600;
175
+ color: #2d3748;
176
+ }
177
+
178
+ .error-message {
179
+ background: #fff5f5;
180
+ border: 2px solid #fc8181;
181
+ color: #c53030;
182
+ padding: 15px;
183
+ border-radius: 8px;
184
+ margin-top: 20px;
185
+ display: none;
186
+ }
187
+
188
+ .error-message.show {
189
+ display: block;
190
+ }
191
+ </style>
192
+ </head>
193
+
194
+ <body>
195
+ <div class="container">
196
+ <h1>πŸ›‘οΈ Insurance Fraud Detection</h1>
197
+ <p class="subtitle">AI-powered fraud probability assessment for insurance claims</p>
198
+
199
+ <!-- Model Configuration -->
200
+ <div class="config-section">
201
+ <div class="config-title">Model Configuration</div>
202
+ <div class="config-grid">
203
+ <div class="form-group">
204
+ <label for="model">Model Type</label>
205
+ <select id="model">
206
+ <option value="rf">Random Forest</option>
207
+ <option value="et">Extra Trees</option>
208
+ <option value="xgb" selected>XGBoost</option>
209
+ </select>
210
+ </div>
211
+
212
+ <div class="form-group">
213
+ <label for="scenario">Deployment Scenario</label>
214
+ <select id="scenario">
215
+ <option value="dashboard" selected>Dashboard (Calibrated)</option>
216
+ <option value="auto_flagger">Auto-Flagger (Uncalibrated)</option>
217
+ </select>
218
+ </div>
219
+ </div>
220
+ </div>
221
+
222
+ <!-- Claim Input Form -->
223
+ <form id="claimForm">
224
+ <div class="input-grid">
225
+ <div class="form-group">
226
+ <label for="policy_annual_premium">Policy Annual Premium ($)</label>
227
+ <input type="number" id="policy_annual_premium" step="0.01" value="1200" required>
228
+ </div>
229
+
230
+ <div class="form-group">
231
+ <label for="total_claim_amount">Total Claim Amount ($)</label>
232
+ <input type="number" id="total_claim_amount" step="0.01" value="15000" required>
233
+ </div>
234
+
235
+ <div class="form-group">
236
+ <label for="vehicle_age">Vehicle Age (years)</label>
237
+ <input type="number" id="vehicle_age" min="0" value="5" required>
238
+ </div>
239
+
240
+ <div class="form-group">
241
+ <label for="days_since_bind">Days Since Policy Bind</label>
242
+ <input type="number" id="days_since_bind" min="0" value="300" required>
243
+ </div>
244
+
245
+ <div class="form-group">
246
+ <label for="months_as_customer">Months as Customer</label>
247
+ <input type="number" id="months_as_customer" min="0" value="24" required>
248
+ </div>
249
+
250
+ <div class="form-group">
251
+ <label for="injury_share">Injury Damage Share</label>
252
+ <input type="number" id="injury_share" step="0.01" min="0" max="1" value="0.4" required>
253
+ </div>
254
+
255
+ <div class="form-group">
256
+ <label for="property_share">Property Damage Share</label>
257
+ <input type="number" id="property_share" step="0.01" min="0" max="1" value="0.6" required>
258
+ </div>
259
+
260
+ <div class="form-group">
261
+ <label for="umbrella_limit">Umbrella Policy Limit</label>
262
+ <input type="number" id="umbrella_limit" min="0" value="0" required>
263
+ </div>
264
+
265
+ <div class="form-group">
266
+ <label for="incident_hour_of_the_day">Incident Hour (0-23)</label>
267
+ <input type="number" id="incident_hour_of_the_day" min="0" max="23" value="14" required>
268
+ </div>
269
+
270
+ <div class="form-group">
271
+ <label for="capital_gains">Capital Gains ($)</label>
272
+ <input type="number" id="capital_gains" step="0.01" value="0">
273
+ </div>
274
+
275
+ <div class="form-group">
276
+ <label for="capital_loss">Capital Loss ($)</label>
277
+ <input type="number" id="capital_loss" step="0.01" value="0">
278
+ </div>
279
+ </div>
280
+
281
+ <button type="submit" class="predict-btn" id="predictBtn">
282
+ πŸ” Analyze Claim
283
+ </button>
284
+ </form>
285
+
286
+ <!-- Result Display -->
287
+ <div id="resultSection" class="result-section">
288
+ <div class="result-title" id="resultTitle">Analysis Result</div>
289
+ <div class="result-grid">
290
+ <div class="result-item">
291
+ <div class="result-label">Model Used</div>
292
+ <div class="result-value" id="resultModel">-</div>
293
+ </div>
294
+ <div class="result-item">
295
+ <div class="result-label">Fraud Probability</div>
296
+ <div class="result-value" id="resultProbability">-</div>
297
+ </div>
298
+ <div class="result-item">
299
+ <div class="result-label">Decision</div>
300
+ <div class="result-value" id="resultDecision">-</div>
301
+ </div>
302
+ <div class="result-item">
303
+ <div class="result-label">Scenario</div>
304
+ <div class="result-value" id="resultScenario">-</div>
305
+ </div>
306
+ </div>
307
+ </div>
308
+
309
+ <!-- Error Display -->
310
+ <div id="errorMessage" class="error-message"></div>
311
+ </div>
312
+
313
+ <script>
314
+ const form = document.getElementById('claimForm');
315
+ const predictBtn = document.getElementById('predictBtn');
316
+ const resultSection = document.getElementById('resultSection');
317
+ const errorMessage = document.getElementById('errorMessage');
318
+
319
+ form.addEventListener('submit', async (e) => {
320
+ e.preventDefault();
321
+
322
+ // Hide previous results/errors
323
+ resultSection.classList.remove('show', 'fraud', 'legit');
324
+ errorMessage.classList.remove('show');
325
+
326
+ // Disable button
327
+ predictBtn.disabled = true;
328
+ predictBtn.textContent = '⏳ Analyzing...';
329
+
330
+ try {
331
+ // Gather form data
332
+ const formData = {
333
+ policy_annual_premium: parseFloat(document.getElementById('policy_annual_premium').value),
334
+ total_claim_amount: parseFloat(document.getElementById('total_claim_amount').value),
335
+ vehicle_age: parseInt(document.getElementById('vehicle_age').value),
336
+ days_since_bind: parseInt(document.getElementById('days_since_bind').value),
337
+ months_as_customer: parseInt(document.getElementById('months_as_customer').value),
338
+ 'capital-gains': parseFloat(document.getElementById('capital_gains').value || 0),
339
+ 'capital-loss': parseFloat(document.getElementById('capital_loss').value || 0),
340
+ injury_share: parseFloat(document.getElementById('injury_share').value),
341
+ property_share: parseFloat(document.getElementById('property_share').value),
342
+ umbrella_limit: parseInt(document.getElementById('umbrella_limit').value),
343
+ incident_hour_of_the_day: parseInt(document.getElementById('incident_hour_of_the_day').value)
344
+ };
345
+
346
+ // Get model configuration
347
+ const model = document.getElementById('model').value;
348
+ const scenario = document.getElementById('scenario').value;
349
+
350
+ // Make API request
351
+ const response = await fetch(`/predict?model=${model}&scenario=${scenario}`, {
352
+ method: 'POST',
353
+ headers: {
354
+ 'Content-Type': 'application/json',
355
+ },
356
+ body: JSON.stringify(formData)
357
+ });
358
+
359
+ if (!response.ok) {
360
+ const errorData = await response.json();
361
+ throw new Error(errorData.detail || 'Prediction failed');
362
+ }
363
+
364
+ const result = await response.json();
365
+
366
+ // Display results
367
+ displayResults(result);
368
+
369
+ } catch (error) {
370
+ console.error('Error:', error);
371
+ errorMessage.textContent = `Error: ${error.message}`;
372
+ errorMessage.classList.add('show');
373
+ } finally {
374
+ // Re-enable button
375
+ predictBtn.disabled = false;
376
+ predictBtn.textContent = 'πŸ” Analyze Claim';
377
+ }
378
+ });
379
+
380
+ function displayResults(result) {
381
+ // Update result values
382
+ document.getElementById('resultModel').textContent =
383
+ `${result.model} ${result.calibrated ? '(Calibrated)' : '(Uncalibrated)'}`;
384
+
385
+ const probability = (result.probability * 100).toFixed(1);
386
+ document.getElementById('resultProbability').textContent = `${probability}%`;
387
+
388
+ // Determine decision text
389
+ let decision = '-';
390
+ if (result.threshold_flag) {
391
+ decision = result.threshold_flag === 'AUTO_FLAG' ?
392
+ '🚨 FLAG FOR REVIEW' : 'βœ… AUTO APPROVE';
393
+ } else {
394
+ // For dashboard mode
395
+ if (result.probability >= 0.7) decision = 'πŸ”΄ High Risk';
396
+ else if (result.probability >= 0.5) decision = '🟑 Medium Risk';
397
+ else decision = '🟒 Low Risk';
398
+ }
399
+ document.getElementById('resultDecision').textContent = decision;
400
+
401
+ document.getElementById('resultScenario').textContent =
402
+ result.scenario === 'auto_flagger' ? 'Auto-Flagger' : 'Dashboard';
403
+
404
+ // Style result section
405
+ resultSection.classList.add('show');
406
+ if (result.probability >= 0.5) {
407
+ resultSection.classList.add('fraud');
408
+ document.getElementById('resultTitle').textContent = '⚠️ High Fraud Risk Detected';
409
+ } else {
410
+ resultSection.classList.add('legit');
411
+ document.getElementById('resultTitle').textContent = 'βœ“ Low Fraud Risk';
412
+ }
413
+ }
414
+ </script>
415
+ </body>
416
+
417
+ </html>
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn[standard]==0.24.0
3
+ pydantic==2.5.0
4
+ joblib==1.3.2
5
+ numpy==1.24.3
6
+ pandas==2.0.3
7
+ scikit-learn==1.3.2
8
+ xgboost==2.0.3
test_api.sh ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Fraud Detection API - Example curl Commands
4
+
5
+ BASE_URL="http://localhost:7860"
6
+
7
+ echo "========================================="
8
+ echo "Fraud Detection API - Example Requests"
9
+ echo "========================================="
10
+ echo ""
11
+
12
+ # Test 1: Dashboard Scenario (Calibrated) with XGBoost
13
+ echo "1. Dashboard Scenario (Calibrated XGBoost):"
14
+ echo "-----------------------------------------"
15
+ curl -X POST "${BASE_URL}/predict?model=xgb&scenario=dashboard" \
16
+ -H "Content-Type: application/json" \
17
+ -d @example_claim.json
18
+ echo -e "\n\n"
19
+
20
+ # Test 2: Auto-Flagger Scenario (Uncalibrated) with RandomForest
21
+ echo "2. Auto-Flagger Scenario (Uncalibrated RandomForest):"
22
+ echo "-----------------------------------------------------"
23
+ curl -X POST "${BASE_URL}/predict?model=rf&scenario=auto_flagger" \
24
+ -H "Content-Type: application/json" \
25
+ -d @example_claim.json
26
+ echo -e "\n\n"
27
+
28
+ # Test 3: High-Risk Claim Example
29
+ echo "3. High-Risk Claim (Auto-Flagger with ExtraTrees):"
30
+ echo "---------------------------------------------------"
31
+ curl -X POST "${BASE_URL}/predict?model=et&scenario=auto_flagger" \
32
+ -H "Content-Type: application/json" \
33
+ -d '{
34
+ "policy_annual_premium": 500,
35
+ "total_claim_amount": 50000,
36
+ "vehicle_age": 1,
37
+ "days_since_bind": 10,
38
+ "months_as_customer": 2,
39
+ "capital-gains": 10000,
40
+ "capital-loss": 0,
41
+ "injury_share": 0.8,
42
+ "property_share": 0.2,
43
+ "umbrella_limit": 0,
44
+ "incident_hour_of_the_day": 3
45
+ }'
46
+ echo -e "\n\n"
47
+
48
+ # Test 4: Health Check
49
+ echo "4. Health Check:"
50
+ echo "----------------"
51
+ curl -X GET "${BASE_URL}/health"
52
+ echo -e "\n\n"
53
+
54
+ echo "========================================="
55
+ echo "All tests completed!"
56
+ echo "========================================="