GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4

This dataset is a composition of six toxic or hateful synthetic datasets based on the datasets published by: "Large scale crowdsourcing and characterization of twitter abusive behavior" "Hate Speech Dataset from a White Supremacy Forum" "Automated hate speech detection and the problem of offensive language" "Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter" "Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments" "Don't patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities" All data is generated by a separate GPT-3 Curie model fine-tuned on one label of the dataset. The data is not filtered and likely needs to be processed before being useful.

Related Organizations

University of Regensburg
Germany

Keywords

Synthetic Data, Data Augmentation

EOSC Subjects

Twitter Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average