TRACES Bulgarian Twitter Dataset on Famous Bulgarian Political Cases of Suspected Lies, Annotated with Linguistic Markers of Lies

This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 15850 tweet IDs of tweets, written in Bulgarian, with annotations. The dataset can be used for general use or for building lies and disinformation detection applications. Note: this dataset is not fact-checked, the social media messages have been retrieved via keywords. For fact-checked datasets, see our other datasets. The tweets (written between 1 Jan 2020 and 7 July 2022) have been collected via Twitter API under academic access in June-July 2022 with the following keywords without retweets: (ваксиниран депутат) OR (ваксинирани депутати) (язовири премиер) OR (язовири прокуратура) OR (язовири прокуратурата) ((мвр хемус) OR мвр) (прокуратура OR прокуратурата) (шефът тотото) OR (изпълнителният директор Българския спортен тотализатор) (кирил петков двойно гражданство) OR (премиер двойно гражданство) OR (премиер гражданство) ((Пътна OR загубена OR загуби OR изчезнала) карта газпром) (министър плагиат плагиатство) OR (плагиат плагиатство) ((изслушване главния прокурор) OR (иван гешев)) (фалшива диплома) (златни паспорти) (апартаментгейт OR (къща за гости) OR (къщи за гости) (оръжия OR оръжие) (Украйна OR украина) ((цена OR цени) (газ OR ток OR нафта OR бензин)) (мвр OR данс) (фалшиви новини) (данъци OR данъчни OR данък) ((кораб Царевна) OR Царевна) (Северна Македония) Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our paper: Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.

The project TRACES has indirectly received funding from the European Union's Horizon 2020 research and innovation action programme, via the AI4Media Open Call #1 issued and executed under the AI4Media project (Grant Agreement no. 951911). The dataset is shared with the License: Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0). In accordance with European Union laws, user profiling of the authors of these texts is forbidden. The Project Sponsors (European Commission and the AI4Media project), Researchers, users or subjects shall not be liable or otherwise responsible for any damages (including pecuniary or moral damages) arising out or in relation to the uses of this dataset. When using the dataset, please cite this article: Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.

Keywords

social media, Twitter, suspected lies, Bulgarian

EOSC Subjects

Twitter Data

3 Research products, page 1 of 1

TRACES Bulgarian Twitter Dataset on Covid-19 Annotated with Linguistic Markers of Lies
2023IsAmongTopNSimilarDocuments
TRACES Bulgarian Twitter Dataset on Lies and Manipulation Annotated with Linguistic Markers of Lies
2023IsAmongTopNSimilarDocuments
TRACES Bulgarian Telegram Dataset Annotated with Linguistic Markers of Lies
2023IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average