DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2022Embargo end date: 01 Jan 2021Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Authors: Hongyu Zhu 0002; Yan Chen; Jing Yan 0004; Jing Liu 0022; Yu Hong 0001; Ying Chen 0011; Hua Wu 0003; +1 Authors

doi: 10.18653/v1/2022.emnlp-main.531 , 10.48550/arxiv.2112.08609

arXiv: 2112.08609

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

- Summary
- Subjects
- Related research
  (6)
- Metrics

Abstract

In this paper, we focus on studying robustness evaluation of Chinese question matching. Most of the previous work on analyzing robustness issue focus on just one or a few types of artificial adversarial examples. Instead, we argue that it is necessary to formulate a comprehensive evaluation about the linguistic capabilities of models on natural texts. For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. DuQM contains 3 categories and 13 subcategories with 32 linguistic perturbations. The extensive experiments demonstrate that DuQM has a better ability to distinguish different models. Importantly, the detailed breakdown of evaluation by linguistic phenomenon in DuQM helps us easily diagnose the strength and weakness of different models. Additionally, our experiment results show that the effect of artificial adversarial examples does not work on the natural texts.

Related Organizations

Soochow University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

6 Research products, page 1 of 1

pycorrector software on GitHub
IsRelatedTo
lac software on GitHub
IsRelatedTo
DDParser software on GitHub
IsRelatedTo
elasticsearch software on GitHub
IsRelatedTo
DuReader software on GitHub
IsRelatedTo
faiss software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average