Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

From Birdwatch to Community Notes, from Twitter to X: four years of community-based content moderation

Authors: Mohammadi, Saeedeh; Chinichian, Narges; Doyal, Hannah; Skutilova, Kristina; Hao, Cui; d'Errico, Michele; Grayson, Siobhan; +1 Authors

From Birdwatch to Community Notes, from Twitter to X: four years of community-based content moderation

Abstract

Dataset and Code Description This repository contains the data and code used to analyse interactions within the Community Notes platform from January 23, 2021, to January 23, 2025. The files are organised as follows: 🧪 Code Notebooks Create_graphs.ipynb: Constructs full interaction networks and separate sub-networks (helpful, somewhat helpful, unhelpful) from the monthly raw rating files. Url_analysis.ipynb: Detects the language of each note and extracts any URLs or domain names mentioned. BERTopic_English_hard_PCA100_UMAP10_MinCluster500.ipynb: Applies BERTopic to English-language notes to extract latent topics. Dimensionality is reduced using PCA (100 components) and UMAP (10 dimensions). Only clusters with at least 500 notes are retained to ensure robustness. 📄 Data Files Notes Data notes_with_lang.csv: All Community Notes written between January 23, 2021, and January 23, 2025, with detected language, extracted URLs, and domain names. english_notes_with_nlp.csv: Subset of English notes with BERTopic topics, topic numbers, and keyword representations. Each note file contains the following variables: noteId: Unique ID of the note. noteAuthorParticipantId: Unique ID of the note's author. tweetId: ID of the tweet the note addresses. date: Date the note was written (YYYY-MM-DD). Timestamp: Time the note was written (HH:MM:SS). language: Detected language of the note. extracted_urls: List of URLs mentioned in the note. news_source: List of extracted domain names. BERTopic_word (only in English notes file): Main topic name. BERTopic_number (only in English notes file): Numeric topic identifier. BERTopic_representation (only in English notes file): List of keywords representing the topic. Rating Data Monthly rating files are stored in the rating monthly files/ directory with the naming format ratings_m_yyyy.csv. Each file includes: noteId: ID of the rated note. raterParticipantId: ID of the participant giving the rating. helpfulnessLevel: Rating category (HELPFUL, SOMEWHAT_HELPFUL, NOT_HELPFUL). helpful, notHelpful: Deprecated binary flags (use helpfulnessLevel instead). 🌐 Network Files Each month’s ratings are used to construct interaction graphs with user-to-user edges based on rating behaviours. Whole Networks (whole_network__.graphml): Full user interaction networks, with edges annotated by the number of helpful, unhelpful, and somewhat helpful ratings. Each edge contains: source: Rater’s participant ID. target: Note author’s participant ID. helpful, unhelpful, somewhathelpful: Count of ratings by type from rater to author. Helpful Networks (network___helpful.graphml): Subnetworks based on helpful ratings only. Somewhat Helpful Networks (network___somewhat.graphml): Subnetworks based on somewhat helpful ratings. Unhelpful Networks (network___unhelpful.graphml): Subnetworks based on unhelpful ratings.

EOSC Subjects

Twitter Data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average