Deepfake Detection Models — Multi-View Anomaly Ensemble

Five complementary detectors trained on FaceForensics++ C23 for binary deepfake classification, framed as multi-view anomaly detection across three classes:

Spatial anomalies: ResNet-18, EfficientNet-B4
Global-consistency anomalies: ViT-B/16
Temporal-motion anomalies: R3D-18, R3D-18+RAFT (optical-flow interpolated)

Soft-vote ensemble of all five tightens the FF++ → Celeb-DF generalization gap from 0.1216 (best single model) to 0.0944 — a 22% reduction.

Performance

In-Dataset (FaceForensics++ C23 test, 900 videos)

Model	Accuracy	F1	AUC
ResNet-18	0.9989	0.9993	0.9999
EfficientNet-B4	0.9944	0.9967	1.0000
R3D-18	0.9756	0.9852	0.9991
ViT-B/16	0.9700	0.9817	0.9992
R3D-18+RAFT	0.9833	0.9899	0.9993
Ensemble (soft-vote)	0.9989	0.9993	1.0000

Cross-Dataset (Celeb-DF v2, zero-shot, 6528 videos)

Model	AUC	Δ Generalization Gap
EfficientNet-B4	0.8173	0.1827
ResNet-18	0.8209	0.1790
R3D-18	0.8413	0.1577
R3D-18+RAFT	0.8744	0.1249
ViT-B/16	0.8777	0.1216
Ensemble (soft-vote)	0.9056	0.0944

Files

File	Description
`resnet18_best.pth`	ResNet-18 baseline (single-stage AdamW + ReduceLROnPlateau)
`efficientnet_b4_best.pth`	EfficientNet-B4 (two-stage: head warmup → full fine-tune)
`r3d18_best.pth`	R3D-18 3D-conv classifier (two-stage)
`vit_base_patch16_224_best.pth`	ViT-B/16 via timm (two-stage)
`r3d18_raft_best.pth`	R3D-18 on RAFT optical-flow interpolated frames
`metadata.json`	Per-model run-id + metrics in machine-readable form

Usage

from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    repo_id="abraraltaf92/deepfake-detection-models",
    filename="resnet18_best.pth",
)
state_dict = torch.load(path, map_location="cpu")

The model classes that produced these checkpoints live in the companion code repo: github.com/abraraltaf92/deepfake-detection (see src/models.py and src/training.py).

Training Recipe

Preprocessing: 16 frames per video, MTCNN face crop @ 224×224 (R3D models @ 112×112)
Class-weighted cross-entropy: w_c = N_train / (2·N_c)
Identity-component split: 4200/900/900 (70/15/15)
Two-stage training (advanced models): 3 epochs head-only warmup + 10 epochs full fine-tune
Single-stage A2 recipe (ResNet-18 baseline)
Mixed precision (fp16) on Colab Pro CUDA

Project Context

CS 668 Analytics Capstone, Pace University.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support