Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao ZENODOarrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2025
Data sources: ZENODO
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2025
Data sources: ZENODO
ZENODO
Dataset . 2025
Data sources: Datacite
ZENODO
Dataset . 2025
Data sources: Datacite
ZENODO
Dataset . 2025
Data sources: Datacite
versions View all 3 versions
addClaim

MultiCheckWorthy (MultiCW) dataset

Authors: Martin Hyben;

MultiCheckWorthy (MultiCW) dataset

Abstract

If you would like to request access to these files, please fill out the form below. You need to satisfy these conditions in order for this request to be accepted: In order to share the dataset with you, please agree to the following terms: You will use dataset strictly only for research purposes. The request for access to the dataset must be sent from the official and existing e-mail address of the relevant university, faculty or other scientific or research institution (for verification purposes). You will not attempt to identify, deanonymize or contact the authors of the social media posts included in this dataset. You will not re-share the dataset (or any of its parts) with anyone else not included in this request. You will appropriately cite the papers mentioned in the dataset description in any publication, project, tool using this dataset. You understand how the dataset was created and that the manual or automatically predicted annotations may not be 100% correct. You acknowledge that you are fully responsible for the use of the dataset (data) and for any infringement of rights of third parties (in particular copyright) that may arise from its use beyond the intended purposes. Neither the authors nor Kempelen Institute of Intelligent Technologies (KInIT) are responsible for your actions.

The MultiCheckWorthy (MultiCW) dataset is a balanced multilingual benchmarking dataset for a check-worthy claim detection, covering 16 languages, 6 topical domains, and 2 writing styles. The dataset consists of 123,722 samples, evenly distributed between noisy and structured texts, with balanced representation of check-worthy and non-check-worthy classes across all languages. Each claim is accompanied by its English translation, detected topic, writing style, language code, check-worthyness label as well as the list of detected named entities. The dataset was composed of existing datasets and balanced by translating the samples from the existing datasets as well as using the samples collected from Wikipedia. The dataset is partitioned into training, validation, and test set. In addition, we construct a separate out-of-distribution (OOD) set consisting of 4 other languages (it, mk, nl and my), to evaluate model generalization beyond the in-distribution data. Bellow is the number of samples included in each set: Set Samples Train 86,691 Validation 18,491 Test 18,540 Out-of-distribution (OOD) 27,761

Keywords

Large Language Models, check-worthy claims, fine-tuning Transformers

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Funded by