<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Faster shift-reduce constituent parsing with a non-binary, bottom-up strategy

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Oct 2019Embargo end date: 01 Jan 2018 Spain English Publisher:Elsevier BVJournal:Artificial Intelligence, volume 275, pages 559-574 (issn: 0004-3702,

Authors: Fernández-González, Daniel; Gómez-Rodríguez, Carlos;

doi: 10.1016/j.artint.2019.07.006 , 10.48550/arxiv.1804.07961

arXiv: http://arxiv.org/abs/1804.07961

handle: 2183/35077

Faster shift-reduce constituent parsing with a non-binary, bottom-up strategy

- Summary
- Subjects
- Metrics

Abstract

An increasingly wide range of artificial intelligence applications rely on syntactic information to process and extract meaning from natural language text or speech, with constituent trees being one of the most widely used syntactic formalisms. To produce these phrase-structure representations from sentences in natural language, shift-reduce constituent parsers have become one of the most efficient approaches. Increasing their accuracy and speed is still one of the main objectives pursued by the research community so that artificial intelligence applications that make use of parsing outputs, such as machine translation or voice assistant services, can improve their performance. With this goal in mind, we propose in this article a novel non-binary shift-reduce algorithm for constituent parsing. Our parser follows a classical bottom-up strategy but, unlike others, it straightforwardly creates non-binary branchings with just one Reduce transition, instead of requiring prior binarization or a sequence of binary transitions, allowing its direct application to any language without the need of further resources such as percolation tables. As a result, it uses fewer transitions per sentence than existing transition-based constituent parsers, becoming the fastest such system and, as a consequence, speeding up downstream applications. Using static oracle training and greedy search, the accuracy of this novel approach is on par with state-of-the-art transition-based constituent parsers and outperforms all top-down and bottom-up greedy shift-reduce systems on the Wall Street Journal section from the English Penn Treebank and the Penn Chinese Treebank. Additionally, we develop a dynamic oracle for training the proposed transition-based algorithm, achieving further improvements in both benchmarks and obtaining the best accuracy to date on the Penn Chinese Treebank among greedy shift-reduce parsers.

Final peer-reviewed manuscript accepted for publication

Country

Spain

Related Organizations

University of Córdoba
Spain
University of A Coruña
Spain

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, I.2.7, 68T50, Computation and Language (cs.CL)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	16
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Green

bronze

Fields of Science (4) View all

Fields of Science

Funded by