Deepfake Detection Models โ Multi-View Anomaly Ensemble
Five complementary detectors trained on FaceForensics++ C23 for binary deepfake classification, framed as multi-view anomaly detection across three classes:
- Spatial anomalies: ResNet-18, EfficientNet-B4
- Global-consistency anomalies: ViT-B/16
- Temporal-motion anomalies: R3D-18, R3D-18+RAFT (optical-flow interpolated)
Soft-vote ensemble of all five tightens the FF++ โ Celeb-DF generalization gap from 0.1216 (best single model) to 0.0944 โ a 22% reduction.
Performance
In-Dataset (FaceForensics++ C23 test, 900 videos)
| Model | Accuracy | F1 | AUC |
|---|---|---|---|
| ResNet-18 | 0.9989 | 0.9993 | 0.9999 |
| EfficientNet-B4 | 0.9944 | 0.9967 | 1.0000 |
| R3D-18 | 0.9756 | 0.9852 | 0.9991 |
| ViT-B/16 | 0.9700 | 0.9817 | 0.9992 |
| R3D-18+RAFT | 0.9833 | 0.9899 | 0.9993 |
| Ensemble (soft-vote) | 0.9989 | 0.9993 | 1.0000 |
Cross-Dataset (Celeb-DF v2, zero-shot, 6528 videos)
| Model | AUC | ฮ Generalization Gap |
|---|---|---|
| EfficientNet-B4 | 0.8173 | 0.1827 |
| ResNet-18 | 0.8209 | 0.1790 |
| R3D-18 | 0.8413 | 0.1577 |
| R3D-18+RAFT | 0.8744 | 0.1249 |
| ViT-B/16 | 0.8777 | 0.1216 |
| Ensemble (soft-vote) | 0.9056 | 0.0944 |
Files
| File | Description |
|---|---|
resnet18_best.pth |
ResNet-18 baseline (single-stage AdamW + ReduceLROnPlateau) |
efficientnet_b4_best.pth |
EfficientNet-B4 (two-stage: head warmup โ full fine-tune) |
r3d18_best.pth |
R3D-18 3D-conv classifier (two-stage) |
vit_base_patch16_224_best.pth |
ViT-B/16 via timm (two-stage) |
r3d18_raft_best.pth |
R3D-18 on RAFT optical-flow interpolated frames |
metadata.json |
Per-model run-id + metrics in machine-readable form |
Usage
from huggingface_hub import hf_hub_download
import torch
path = hf_hub_download(
repo_id="abraraltaf92/deepfake-detection-models",
filename="resnet18_best.pth",
)
state_dict = torch.load(path, map_location="cpu")
The model classes that produced these checkpoints live in the companion code repo:
github.com/abraraltaf92/deepfake-detection
(see src/models.py and src/training.py).
Training Recipe
- Preprocessing: 16 frames per video, MTCNN face crop @ 224ร224 (R3D models @ 112ร112)
- Class-weighted cross-entropy:
w_c = N_train / (2ยทN_c) - Identity-component split: 4200/900/900 (70/15/15)
- Two-stage training (advanced models): 3 epochs head-only warmup + 10 epochs full fine-tune
- Single-stage A2 recipe (ResNet-18 baseline)
- Mixed precision (fp16) on Colab Pro CUDA
Project Context
CS 668 Analytics Capstone, Pace University.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support