NNSeval: Evaluating Lexical Simplification for Non-Natives

We have conducted a user study to learn more about word complexity for non-native speakers. 400 non-native speakers participated in the experiment, all university students or staff. They were asked to judge whether or not they could understand the meaning of each content word (nouns, verbs, adjectives and adverbs, as tagged by Freeling (Padr and Stanilovsky (2012)) in a set of sentences, each of which was judged independently. Volunteers were instructed to annotate all words that they could not understand individually, even if they could comprehend the meaning of the sentence as a whole. All sentences used were taken from Wikipedia, LSeval and LexMTurk. A total of 35,958 distinct words from 9,200 sentences were annotated (232,481 total), of which 3,854 distinct words (6,388 total) were deemed as complex by at least one annotator. Using the data produced in the user study, we first assessed reliability of the LSeval and LexMTurk datasets in evaluating LS systems for non-native speakers. We found that the proportion of target words deemed complex by at least one annotator was only 30.8% for LexMTurk, and 15% for LSe- val. As for the candidate substitutions, 21.7% of the ones in LSeval and 13.4% in LexMTurk were deemed complex by at least one annotator. These results show that, although they may not be used in their entirety, both datasets contain instances that are suit- able for our purposes. To create our dataset, we first used the Text Adorning module of LEXenstein (Paetzold and Specia 2015; Burns 2013) to inflect all candidate verbs and nouns in both datasets to the same tense as the target word. We then used the Spelling Correction module of LEXenstein to correct any misspelled words among the candidates of both datasets. Next, we removed all candidate substitutes which were deemed complex by at least one annotator in our user study. Finally, we discarded all instances in which the target word was not deemed complex by any of our annotators. The resulting dataset, which we refer to as NNSeval, contains 239 instances.

http://ghpaetzold.github.io/data/NNSeval.zip

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average