ASR pipeline for low-resourced languages: A case study on Pomak

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2023Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the Second Workshop on NLP Applications to Field Linguistics

Authors: Tsoukala, Chara; Kritsis, Kosmas; Douros, Ioannis; Katsamanis, Athanasios; Kokkas, Nikolaos; Arampatzakis, Vasileios; Sevetlidis, Vasileios; +2 Authors

doi: 10.18653/v1/2023.fieldmatters-1.5 , 10.5281/zenodo.7759709 , 10.5281/zenodo.8413278 , 10.5281/zenodo.7759708

ASR pipeline for low-resourced languages: A case study on Pomak

- Summary
- Subjects
- Metrics

Abstract

Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.

Related Organizations

Keywords

low-resourced ASR, ASR, speech-text alignments, forced-alignments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average