A large annotated corpus for learning natural language inference

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2015Embargo end date: 01 Jan 2015Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 2015 Conference on Empirical Methods in Natural Language ProcessingFunded by:NSF | RI: Medium: Bringing Sent...

Authors: Samuel R. Bowman; Gabor Angeli; Christopher Potts; Christopher D. Manning;

doi: 10.18653/v1/d15-1075 , 10.48550/arxiv.1508.05326

arXiv: 1508.05326

A large annotated corpus for learning natural language inference

- Summary
- Subjects
- Metrics

Abstract

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

To appear at EMNLP 2015. The data will be posted shortly before the conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli/

Related Organizations

Stanford University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	890
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%