Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics

# Institute For the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset: The ISCA project has compiled this dataset using an annotation portal, which was used to label tweets as either antisemitic or non-antisemitic, among other labels. Please note that the annotation was done with live data, including images and the context, such as threads. The original data was sourced from annotationportal.com. # Content: This dataset contains 6,941 tweets that cover a wide range of topics common in conversations about Jews, Israel, and antisemitism between January 2019 and December 2021. The dataset is drawn from representative samples during this period with relevant keywords. 1,250 tweets (18%) meet the IHRA definition of antisemitic messages. The dataset has been compiled within the ISCA project using an annotation portal to label tweets as either antisemitic or non-antisemitic. The original data was sourced from annotationportal.com. The tweets' distribution of all messages by year is as follows: 1,499 (22%) from 2019, 3,716 (54%) from 2020, and 1,726 (25%) from 2021. 4,605 (66%) contain the keyword "Jews," 1,524 (22%) include "Israel," 529 (8%) feature the derogatory term "ZioNazi*," and 283 (4%) use the slur "K---s." Some tweets may contain multiple keywords. 483 out of the 4,605 tweets with the keyword "Jews" (11%) and 203 out of the 1,524 tweets with the keyword "Israel" (13%) were classified as antisemitic. 97 out of the 283 tweets using the antisemitic slur "K---s" (34%) are antisemitic. Interestingly, many tweets featuring the slur "K---s" actually call out its usage. In contrast, the majority of tweets with the derogatory term "ZioNazi*" are antisemitic, with 467 out of 529 (88%) being classified as such. File Description: The dataset is provided in a csv file format, with each row representing a single message, including replies, quotes, and retweets. The file contains the following columns: ‘TweetID’: Represents the tweet ID. ‘Username’: Represents the username who published the tweet. ‘Text’: Represents the full text of the tweet (not pre-processed). ‘CreateDate’: Represents the date the tweet was created. ‘Biased’: Represents the labeled by our annotations if the tweet is antisemitic or non-antisemitic. ‘Keyword’: Represents the keyword that was used in the query. The keyword can be in the text, including mentioned names, or the username. Licences Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0) R code is published under the terms of the "MIT" licence (https://opensource.org/licenses/MIT) ‘ Acknowledgements We are grateful for the support of Indiana University’s Observatory on Social Media (OSoMe) (Davis et al. 2016) and the contributions and annotations of all team members in our Social Media & Hate Research Lab at Indiana University’s Institute for the Study of Contemporary Antisemitism, especially Grace Bland, Elisha S. Breton, Kathryn Cooper, Robin Forstenhäusler, Sophie von Máriássy, Mabel Poindexter, Jenna Solomon, Clara Schilling, and Victor Tschiskale. This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

{"references": ["https://arxiv.org/abs/2304.14599"]}

Related Organizations

Technical University of Berlin
Germany
Indiana University Bloomington
United States

Keywords

Labeled Dataset, Annotation, Hate Speech, Gold Standard, Antisemitism

EOSC Subjects

Twitter Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	653
download	downloads	79

653
views
79
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

653

79