Dominique: An AI-Powered Fact-Checking Chatbot for Democratizing Access to Reliable Information

Authors: Tavares, Denis; Barreto, Marina; Silva de Almeida, Breno Livio; Florentino, Bruno; Tetzner, Felipe; Tegoni Goedert, Guilherme; Struchiner, Claudio; +2 Authors

doi: 10.5281/zenodo.17764702 , 10.5281/zenodo.17764703

Dominique: An AI-Powered Fact-Checking Chatbot for Democratizing Access to Reliable Information

- Summary
- Metrics

Abstract

This repository contains two datasets developed for research in Brazilian Portuguese fake news detection: Golden Dataset: The primary dataset, comprising 22,044 unique news articles (11,145 fake, 10,899 true) in Brazilian Portuguese. It was created by merging and deduplicating three established corpora, Fake.Br, FakeTrueBR, and FakeRecogna, to form a larger, more robust, and balanced resource. It includes extensive metadata such as source, publication date, author, and linguistic features to support the development of advanced machine learning models. Gemini Validation Dataset: A synthetic, health-focused dataset of 1,000 news instances (labeled as true or fake) generated using Google's Gemini LLM. This dataset was specifically created for external validation to test the generalization capability of trained models on unseen, out-of-distribution topics, simulating a real-world fact-checking scenario.

Related Organizations

Helmholtz Centre for Environmental Research
Germany
Helmholtz Association of German Research Centres
Germany
Leipzig University
Germany
Universidade de São Paulo
Brazil
Federal University of Technology – Paraná
Brazil

View all View all

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average