Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media

Authors: Sarkar, Tuhin;

Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media

Abstract

Abstract: This study evaluates the performance of publicly accessible deepfake detection tools on 20 viral political and celebrity videos. Deepfakes pose serious risks to public trust and information integrity, yet the reliability of off-the-shelf detection tools for identifying real-world deepfakes remains unclear. We hypothesised that publicly available detectors would show inconsistent accuracy and produce both false positives and false negatives when applied to in-the-wild videos. To test this, we evaluated 20 viral clips (10 confirmed deepfakes, 10 authentic controls) using two public detection platforms: Deepware AI Scanner and UB Media Forensics Lab's DeepFake-O-Meter. We recorded ensemble and per-model likelihoods across more than ten detectors. Results revealed substantial cross-platform disagreement and significant inconsistencies, including frequent false positives and false negatives. One platform's ensemble flagged only a minority of confirmed deepfakes while the research platform produced extreme per-model score variance, so that sensitivity depended strongly on how an intermediate "Suspicious" label was treated. Depending on the binary mapping used, measured sensitivity varied widely while specificity remained high for this sample. Our results demonstrate the shortcomings of existing detection technologies and the pressing need for more reliable, transparent, and strong deepfake forensic techniques. We conclude that current public detectors provide useful signals but are not yet reliable as sole arbiters of authenticity for viral content. We recommend publishing full per-video numeric outputs, versioned model identifiers, and pairing automated screening with human expert review.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green