Deepfake Detection Models โ€” Multi-View Anomaly Ensemble

Five complementary detectors trained on FaceForensics++ C23 for binary deepfake classification, framed as multi-view anomaly detection across three classes:

  • Spatial anomalies: ResNet-18, EfficientNet-B4
  • Global-consistency anomalies: ViT-B/16
  • Temporal-motion anomalies: R3D-18, R3D-18+RAFT (optical-flow interpolated)

Soft-vote ensemble of all five tightens the FF++ โ†’ Celeb-DF generalization gap from 0.1216 (best single model) to 0.0944 โ€” a 22% reduction.

Performance

In-Dataset (FaceForensics++ C23 test, 900 videos)

Model Accuracy F1 AUC
ResNet-18 0.9989 0.9993 0.9999
EfficientNet-B4 0.9944 0.9967 1.0000
R3D-18 0.9756 0.9852 0.9991
ViT-B/16 0.9700 0.9817 0.9992
R3D-18+RAFT 0.9833 0.9899 0.9993
Ensemble (soft-vote) 0.9989 0.9993 1.0000

Cross-Dataset (Celeb-DF v2, zero-shot, 6528 videos)

Model AUC ฮ” Generalization Gap
EfficientNet-B4 0.8173 0.1827
ResNet-18 0.8209 0.1790
R3D-18 0.8413 0.1577
R3D-18+RAFT 0.8744 0.1249
ViT-B/16 0.8777 0.1216
Ensemble (soft-vote) 0.9056 0.0944

Files

File Description
resnet18_best.pth ResNet-18 baseline (single-stage AdamW + ReduceLROnPlateau)
efficientnet_b4_best.pth EfficientNet-B4 (two-stage: head warmup โ†’ full fine-tune)
r3d18_best.pth R3D-18 3D-conv classifier (two-stage)
vit_base_patch16_224_best.pth ViT-B/16 via timm (two-stage)
r3d18_raft_best.pth R3D-18 on RAFT optical-flow interpolated frames
metadata.json Per-model run-id + metrics in machine-readable form

Usage

from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    repo_id="abraraltaf92/deepfake-detection-models",
    filename="resnet18_best.pth",
)
state_dict = torch.load(path, map_location="cpu")

The model classes that produced these checkpoints live in the companion code repo: github.com/abraraltaf92/deepfake-detection (see src/models.py and src/training.py).

Training Recipe

  • Preprocessing: 16 frames per video, MTCNN face crop @ 224ร—224 (R3D models @ 112ร—112)
  • Class-weighted cross-entropy: w_c = N_train / (2ยทN_c)
  • Identity-component split: 4200/900/900 (70/15/15)
  • Two-stage training (advanced models): 3 epochs head-only warmup + 10 epochs full fine-tune
  • Single-stage A2 recipe (ResNet-18 baseline)
  • Mixed precision (fp16) on Colab Pro CUDA

Project Context

CS 668 Analytics Capstone, Pace University.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support