
handle: 10230/54109 , 1854/LU-8756877
Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.
Work in this paper is part of the SignON project.27 This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 101017255. Mathieu De Coster’s research is funded by the Research Foundation Flanders (FWO Vlaanderen): file number 77410.
Comunicació presentada a: LREC 2022, 13th International Conference on Language Resources and Evaluation, celebrat del 20 al 25 de juny de 2022, a Marsella, França.
Technology, Technology and Engineering, Science & Technology, sign language recognition, Social Sciences, Linguistics, unified data format, Languages and Literatures, Machine Learning, machine learning, Neural Translation Models, Computer Science, Sign Languages, sign language corpora, Computer Science, Interdisciplinary Applications, sign language translation
Technology, Technology and Engineering, Science & Technology, sign language recognition, Social Sciences, Linguistics, unified data format, Languages and Literatures, Machine Learning, machine learning, Neural Translation Models, Computer Science, Sign Languages, sign language corpora, Computer Science, Interdisciplinary Applications, sign language translation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
