Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Triangulation and Utilization of multimodal data for fake news detection with social constructs

Authors: Ali, Ishfaq;

Triangulation and Utilization of multimodal data for fake news detection with social constructs

Abstract

To overcome the limitations found in many existing fake news datasets, which often analyze either news content or social media posts in isolation, we present a comprehensive, triangulated dataset that systematically interlinks four essential components: original news articles, social media posts, multimedia content, and veracity labels. The original news articles are sourced from NELA-GT as well as mainstream media outlets, providing a foundational layer of factual reporting. These articles are paired with their corresponding social media derivatives, which include posts from platforms such as Twitter and Reddit, along with extensive metadata like engagement statistics and bot-likelihood scores. Multimedia content, including both images and videos, is incorporated from datasets like FakeNewsNet to allow for visual misinformation analysis. Veracity labels are curated through fact-checked claims provided by the TruthSeekers repository, ensuring each instance is associated with a trusted assessment of truthfulness. The resulting dataset contains 158,400 meticulously aligned instances, encompassing a rich array of modalities such as text, image data, social interaction context, and temporal metadata. The alignment of these diverse data points is achieved through a multi-tiered method. This includes URL and keyword matching using Levenshtein distance thresholds (0.85), and multimodal validation using CLIP similarity scores (>0.7). These techniques collectively ensure high-confidence matching across modalities. Compared to existing datasets such as FakeNewsNet and LIAR, our triangulated dataset offers several critical advantages. It uniquely includes social context features like bot scores and retweet graphs, supports multimodal pairings of text, images, and social media posts, and allows for provenance tracking by comparing original and manipulated versions of content. For instance, it enables detailed tracing of how a legitimate BBC article titled “Climate Accord Signed” may be repurposed into a misleading viral tweet like “Politicians FAKED climate deal!” accompanied by doctored images. This level of integration provides researchers with a powerful tool to study the lifecycle and mutation of fake news across platforms and modalities.

Related Organizations
EOSC Subjects

Twitter Data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average