
This repository contains two datasets developed for research in Brazilian Portuguese fake news detection: Golden Dataset: The primary dataset, comprising 22,044 unique news articles (11,145 fake, 10,899 true) in Brazilian Portuguese. It was created by merging and deduplicating three established corpora, Fake.Br, FakeTrueBR, and FakeRecogna, to form a larger, more robust, and balanced resource. It includes extensive metadata such as source, publication date, author, and linguistic features to support the development of advanced machine learning models. Gemini Validation Dataset: A synthetic, health-focused dataset of 1,000 news instances (labeled as true or fake) generated using Google's Gemini LLM. This dataset was specifically created for external validation to test the generalization capability of trained models on unseen, out-of-distribution topics, simulating a real-world fact-checking scenario.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
