publication . Preprint . 2016

Cross-Lingual Morphological Tagging for Low-Resource Languages

Buys, Jan; Botha, Jan A.;
Open Access English
  • Published: 14 Jun 2016
Abstract
Morphologically rich languages often lack the annotated linguistic resources required to develop accurate natural language processing tools. We propose models suitable for training morphological taggers with rich tagsets for low-resource languages without using direct supervision. Our approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer constraints on the possible tags for a given word type or token. We propose a tagging model using Wsabie, a discriminative embedding-based model with rank-based learning. In our evaluation on 11 languages, on average this model performs on par with a baseline weakly-superv...
Subjects
free text keywords: Computer Science - Computation and Language
Download from
31 references, page 1 of 3

Malin Ahlberg, Markus Forsberg, and Mans Hulden. 2015. Paradigm classification in supervised learning of morphology. In Proceedings of NAACL, pages 1024-1029. [OpenAIRE]

Taylor Berg-Kirkpatrick, Alexandre Bouchard-Coˆte´, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proceedings of NAACL, pages 582-590.

Lauent Besacier, Ettiene Barnard, Alexey Karpov, and Tanja Schultz. 2014. Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56:85-100. [OpenAIRE]

Mathias Creutz and Krista Lagus. 2007. Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing, 4(1).

Dipanjan Das and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of ACL, pages 600-609.

Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal dependencies: A cross-linguistic typology. In Proceedings of LREC.

Long Duong, Trevor Cohn, Karin Verspoor, Steven Bird, and Paul Cook. 2014. What can we get from 1000 tokens? A case study of multilingual pos tagging for resource-poor languages. In Proceedings of EMNLP, pages 886-897.

Greg Durrett and John DeNero. 2013. Supervised learning of complete morphological paradigms. In Proceedings of NAACL, pages 1185-1195.

Chris Dyer, Victor Chahuneau, and Noah A Smith. 2013. A simple, fast, and effective reparameterization of IBM Model 2. In Proceeding of NAACL, pages 682-686.

Manaal Faruqui, Ryan McDonald, and Radu Soricut. 2016. Morpho-syntactic lexicon generation using graph-based semi-supervised learning. Transactions of the Association for Computational Linguistics, 4:1-16.

Victoria Fossum and Steven Abney. 2005. Automatically inducing a part-of-speech tagger by projecting from multiple source languages across aligned corpora. In Proceedings of IJCNLP, pages 862-873. [OpenAIRE]

Kuzman Ganchev and Dipanjan Das. 2013. Crosslingual discriminative learning of sequence models with posterior regularization. In Proceedings of EMNLP, pages 1996-2006.

Jan Hajicˇ, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Anto`nia Mart´ı, Llu´ıs Ma`rquez, Adam Meyers, Joakim Nivre, Sebastian Pado´, Jan Sˇ teˇpa´nek, et al. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of CoNLL: Shared Task, pages 1-18.

Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5, pages 79-86.

Shen Li, Joa˜o Grac¸a, and Ben Taskar. 2012. Wiki-ly supervised part-of-speech tagging. In Proceedings of EMNLP-CoNLL, pages 1389-1398, July.

31 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue