Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao ZENODOarrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2021
Data sources: Datacite
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2021
Data sources: Datacite
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2021
Data sources: Datacite
ZENODO
Dataset . 2021
Data sources: ZENODO
ZENODO
Dataset . 2021
Data sources: ZENODO
versions View all 3 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Dataset for: "It is just a flu: Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations"

Authors: Kostantinos Papadamou; Savvas Zannettou; Jeremy Blackburn; Emiliano De Cristofaro; Gianluca Stringhini; Michael Sirivianos;

Dataset for: "It is just a flu: Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations"

Abstract

Dataset for the paper: "It is just a flu: Assessing the Effect of Watch History on YouTube’s Pseudoscientific Video Recommendations" Abstract: The role played by YouTube’s recommendation algorithm in unwittingly promoting misinformation and conspiracy theories is not entirely understood. Yet, this can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, such as the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the Flat Earth theory, as well as the anti-vaccination and anti-mask movements. Using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant and train a deep learning classifier to detect pseudoscientific videos with an accuracy of 0.79. We quantify user exposure to this content on various parts of the platform and how this exposure changes based on the user’s watch history. We find that YouTube suggests more pseudoscientific content regarding traditional pseudoscientific topics (e.g., flat earth, anti-vaccination) than for emerging ones (like COVID-19). At the same time, these recommendations are more common on the search results page than on a user’s homepage or in the recommendation section when actively watching videos. Finally, we shed light on how a user’s watch history substantially affects the type of recommended videos. Dataset Files The dataset consists of three files: the metadata, comments, and captions of the ground-truth dataset videos collected and manually reviewed in this paper. 1. Video Metadata "groundtruth_videos.json": Contains the metadata of our manually reviewed ground-truth dataset videos. The ground-truth dataset includes 1,197 science, 1,325 pseudoscience, and 3,212 irrelevant videos. More specifically, it includes the metadata of videos related to the following pseudoscientific topics: COVID-19: (607 science, 368 pseudoscience, 721 irrelevant videos) Anti-vaccination (363 science, 394 pseudoscience, and 1,060 irrelevant videos) Anti-mask (65 science, 188 pseudoscience, and 724 irrelevant videos) Flat Earth (162 science, 375 pseudoscience, and 707 irrelevant videos) Note, that 600 of the videos in this dataset include the "annotation.manual_review_label" attribute, which is the label assigned by the first author of this paper to evaluate the performance of the crowdsourced annotation process. - Video Metadata Description: "search_term": The search terms used to search YouTube and retrieve these videos during our data collection. It can be one of the following search terms: 'covid-19', 'coronavirus', 'anti-vaccination', 'anti-vaxx', 'anti-mask', or 'flat earth'. "annotation.annotations": The list of the three annotations assigned to each video by our crowdsourced annotators. "annotation.label": The annotation label assigned to the video based on the majority agreement of the crowdsourced annotators. "annotation.manual_review_label": The label assigned by the first author of this paper to evaluate the performance of the crowdsourced annotation process. "isSeed": 0 if the video is a seed video of our data collection, 1 if it is a recommended video of a seed video. "relatedVideos": The recommended videos of the given video as returned by the YouTube Data API. 2. Video Comments: "groundtruth_videos_comments_ids.json": Includes the identifiers of the comments of our ground-truth videos. 3. Video Transcripts: "groundtruth_videos_transcripts.json": Includes the captions of our ground-truth videos. If you use this dataset in any publication, of any form and kind, please cite using this data. @article{papadamou2020just, title={'It is just a flu': Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations}, author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael}, journal={arXiv preprint arXiv:2010.11638}, year={2020} }

Acknowledgments: This project has received funding from the European Union's Horizon 2020 Research and Innovation program under the CONCORDIA project (Grant Agreement No. 830927), and from the Innovation and Networks Executive Agency (INEA) under the CYberSafety II project (Grant Agreement No. 1614254). This work reflects only the authors' views; the funding agencies are not responsible for any use that may be made of the information it contains.

Keywords

Pseudoscientific Misinformation, Watch History, YouTube's Recommendation Algorithm, YouTube, Science, Flat Earth, Pseudoscience, COVID-19, Anti-vaccination, YouTube Videos, Anti-mask

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 70
    download downloads 10
  • 70
    views
    10
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
70
10