FactSpan: Multilingual Fact-Checking Dataset

The FactSpan dataset is an extension of the X-Fact dataset, designed to support multilingual fact-checking research. This dataset overcomes limitations in existing datasets by incorporating recent data from the ClaimReview Markup for Data Commons Feed and providing detailed annotations. Key Features: Data Source: Claims are sourced from both the X-Fact dataset (up to 2020) and the Data Commons Feed (post-2020). Validity: Claims are filtered to include only those from organizations recognized by the International Fact-Checking Network (IFCN) and Duke Reporters’ Lab, ensuring high reliability. Standardized Labels: Verdict labels are standardized into five categories: False, Mostly False, Partly False/Misleading, Mostly True, and True. Annotations (Annotated Dataset Only): The FactSpan_annotated.csv dataset includes rich annotations generated using GPT-3.5: label: The standardized verdict label. claim: The fact-checked claim. claimDate: The date of the claim. claim_year: The year of the claim. language: The language of the claim. Position Statements: Indicates the presence of position statements. Entity/Event Properties: Indicates the presence of entity or event properties. Quote: Indicates the presence of quotes. Numerical Data: Indicates the presence of numerical data. claim type: Categorizes the claim as factual or opinion. topics: Categorizes the claim into one of five predefined topics (Health and Pandemics, Politics and Governance, Society and Culture, Economy and Environment, Conflict and Security). mapped_label: An additional mapped label, for edge cases or further label mappings. Unannotated Dataset: The FactSpan.csv dataset includes: label: The standardized verdict label. claim: The fact-checked claim. claimDate: The date of the claim. language: The language of the claim. Purpose: This dataset aims to facilitate research in multilingual fact-checking, providing a comprehensive and up-to-date resource for developing and evaluating fact-checking models. Repository: The dataset is maintained in the GitHub repository. The repository also contains scripts for expanding and updating the dataset. This work was supported by the German Research Foundation (DFG, project no. 504226141).

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average