Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Name: Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie
Creator: Naoki Yoshinaga 0001
Keywords: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

Naoki Yoshinaga 0001

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://doi.org/10.18653/v1/20...

Article . 2023 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2023

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Conference object

Data sources: DBLP

DBLP

Article

Data sources: DBLP

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2023Embargo end date: 01 Jan 2023Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Authors: Naoki Yoshinaga 0001;

doi: 10.18653/v1/2023.acl-short.2 , 10.48550/arxiv.2305.19045

arXiv: 2305.19045

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yielding a strikingly simple yet surprisingly accurate morphological analyzer for Japanese. The proposed method induces reliable patterns from a morphological dictionary and annotated data. Experimental results on two standard datasets confirm that the method exhibits comparable accuracy to learning-based baselines, while boasting a remarkable throughput of over 1,000,000 sentences per second on a single modern CPU. The source code is available at https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/

9 pages, 1 figure, 10 tables, Accepted by ACL 2023 (main conference)

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

8 Research products, page 1 of 1

vibrato software on GitHub
IsRelatedTo
vaporetto software on GitHub
IsRelatedTo
jumanpp-jumandic software on GitHub
IsRelatedTo
KyotoCorpus software on GitHub
IsRelatedTo
jumanpp software on GitHub
IsRelatedTo
benchmarks software on GitHub
IsRelatedTo
sentencepiece software on GitHub
IsRelatedTo
KWDLC software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green