
Title: Explainable Deepfake Detection Using Frame Level CNN Models: A Comparative Study of Augmentation and Cutout Techniques Description: This master's thesis investigates the impact of preprocessing strategies—specifically data augmentation and targeted cutout techniques—on both the detection performance and explainability of CNN-based deepfake detection systems. The study employs a frame-level spatial domain approach using EfficientNet-B4 as the backbone architecture in a TimeDistributed configuration, adapted from Seferbekov's award-winning DFDC solution and applied to the FaceForensics++ dataset. Nine distinct experimental configurations were systematically evaluated, spanning baseline (no augmentation), augmentation-only, cutout-only (with black, white, and random fill strategies), and combined augmentation-cutout variants. Each configuration was trained three times to ensure statistical reliability, and performance was assessed using multiple complementary metrics: AUC, F1-Score, Brier Score, and LogLoss. The key finding is that combining augmentation with black-filled cutouts (blackfilledwithaug) achieved the best overall performance, reaching an AUC of 0.8971 and an F1-Score of 0.8429—representing approximately 3% and 8.3% improvements over the baseline, respectively. Notably, augmentation alone degraded detection performance, while cutout fill strategy significantly influenced results, highlighting the importance of strategic preprocessing design. Beyond quantitative metrics, model interpretability was analyzed through Grad-CAM visualizations and region-based activation analysis across eight facial regions. The explainability analysis revealed that augmentation substantially improved attention consistency (40–60% improvement in standard deviation ranges), and identified the nose region as a persistent weakness across all model configurations for both false positive and false negative classifications. The research contributes to the development of more transparent and legally admissible deepfake detection pipelines by demonstrating that carefully designed preprocessing can simultaneously enhance detection accuracy, probability calibration, and model interpretability—critical requirements for forensic and judicial applications where explainable AI is essential. Keywords: deepfake detection · explainable artificial intelligence (XAI) · Grad-CAM · convolutional neural networks (CNN) · EfficientNet · face forgery detection · data augmentation · cutout regularization · FaceForensics++ · transfer learning · model interpretability · digital forensics · face manipulation detection · media forensics · image classification · deep learning · computer vision · facial region analysis · preprocessing strategies · binary classification · SSIM · frame-level detection · video forensics · legal admissibility · trustworthy AI
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
