descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2023Embargo end date: 01 Jan 2022Publisher:Association for Computational Linguistics (ACL)Journal:Findings of the Association for Computational Linguistics: ACL 2023Funded by:EC | CRiTERIA

Authors: Fajcik, Martin; Motlicek, Petr; Smrz, Pavel;

doi: 10.18653/v1/2023.findings-acl.647 , 10.48550/arxiv.2207.14116

arXiv: http://arxiv.org/abs/2207.14116

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities. In this way, the individual contributions of evidences towards the final predicted probability can be identified. In per-evidence relevance probability, our model can further distinguish whether each relevant evidence is supporting (S) or refuting (R) the claim. This allows to quantify how much the S/R probability contributes to the final verdict or to detect disagreeing evidence. Despite its interpretable nature, our system achieves results competitive with state-of-the-art on the FEVER dataset, as compared to typical two-stage system pipelines, while using significantly fewer parameters. It also sets new state-of-the-art on FAVIQ and RealFC datasets. Furthermore, our analysis shows that our model can learn fine-grained relevance cues while using coarse-grained supervision, and we demonstrate it in 2 ways. (i) We show that our model can achieve competitive sentence recall while using only paragraph-level relevance supervision. (ii) Traversing towards the finest granularity of relevance, we show that our model is capable of identifying relevance at the token level. To do this, we present a new benchmark TLR-FEVER focusing on token-level interpretability -- humans annotate tokens in relevant evidences they considered essential when making their judgment. Then we measure how similar are these annotations to the tokens our model is focusing on.

updated acknowledgement

Related Organizations

Idiap Research Institute
Switzerland
Brno University of Technology
Czech Republic

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)

2 Research products, page 1 of 1

ClaimDissector software on GitHub
IsRelatedTo
Baleen software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%