Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Measuring Online Hate on 4chan Using Deep Learning

Authors: Bermudez-Villalva, Adrian; Mehrnezhad, Maryam; Toreini, Ehsan;

Measuring Online Hate on 4chan Using Deep Learning

Abstract

This is the dataset released with the paper titled: "Measuring Online Hate on 4chan Using Deep Learning". This dataset contains a collection of 500,000 posts extracted from the /pol/ board (Politically Incorrect) of 4chan using the 4chan API. The dataset is structured as a single CSV file with one column, com, which includes the raw content of the posts. The dataset does not preserve the structure of threads or replies; instead, it consists of a flat collection of individual posts extracted from /pol/. This format is intended to support applications such as text analysis, natural language processing, and computational social science research by providing a straightforward dataset of raw post content. Dataset Format File Format: CSV (Comma-Separated Values) Columns: com: The raw content of the post. Source The posts were extracted from 4chan’s /pol/ board using the official 4chan API. This board is known for hosting discussions on various topics, often with a focus on political content. Due to the nature of the /pol/ board, the content may include offensive language, hate speech, or otherwise sensitive material. Users should exercise caution and consider ethical implications when analysing this dataset. Potential Use Cases Text analysis and natural language processing (NLP). Studies on online discourse, extremism, or political polarization. Research on language usage and sentiment in online forums. Development and testing of machine learning models for text classification or moderation. Example Data Here’s an example of what a few rows of the dataset look like: com "Why does no one talk about this?" "The government is hiding the truth!" "We need to take action against this injustice." If you find our dataset useful, please cite our paper: @article{ }

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average