
handle: 2318/130554
The development of wide-coverage grammars is at the core of robust NLP systems. This paper addresses the problem of grammar extraction from treebanks with respect to the issue of broad coverage along three dimensions: the grammar formalism (contextfree grammar, dependency grammar, lexicalized tree adjoining grammar), the domain of the annotated corpus (press reports, civil law) and the language of the corpus (English, Korean, Chinese, Italian). We have extracted three grammars from an annotated corpus of Italian and we have comparatively analyzed the coverage of a test set; then, working on two different domain subcorpora we have compared the cross-domain coverage of the extracted grammars; finally, we have compared the grammars for four different languages. The results are that there are relevant differences in coverage among formalisms and domains; a more limited difference appears in the crosslinguistic comparison.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
