
handle: 10045/1504
This paper describes a morphosyntactic analyzer in Galician which, apart from its obvious linguistic interest, can be easily applied to speech recognition and speech synthesis systems. While rule-driven models produce the better performance, stochastic models have shown a comparable accuracy when properly designed. Moreover, rule-driven models are based on a complex set of linguistic rules, quite difficult to maintain and not directly extensible to other languages. On the contrary, stochastic models allow a quick design, if a training corpus is available, and are extremely flexible as they can be adapted to other languages with minor changes in their source code. In order to train the statistic models we began to collect a Galician corpus which, at this time, consists of about 400,000 words with morphosyntactic annotations.
En este artículo describimos la construcción de un analizador morfosintáctico en gallego que, además de su evidente interés lingüístico, sea fácilmente aplicable a sistemas de reconocimiento y síntesis de voz. Los modelos estadísticos han demostrado que son capaces de ofrecer unas prestaciones similares a sistemas que emplean innumerables reglas intrincadas que, por otro lado, son muy difíciles de depurar y mantener. Por el contrario los modelos estocásticos permiten un diseño rápido, si se dispone de un corpus de entrenamiento, y son extremadamente flexibles, ya que pueden ser adaptados a otro idioma sin modificaciones excesivas del código. Para entrenar los modelos estadísticos se ha comenzado la recogida de un corpus en gallego que, por el momento, consta de unas 400.000 palabras etiquetadas morfosintácticamente.
Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología, fondos Feder y la Xunta de Galicia, en los proyectos TIC2002-02208, PGIDT01PXI32205PN y PGIDT02PXI32201PR.
POS tagging, Galician corpus, Corspus gallego, Análisis morfosintáctico, Morphosyntactic analysis, Análisis estadístico
POS tagging, Galician corpus, Corspus gallego, Análisis morfosintáctico, Morphosyntactic analysis, Análisis estadístico
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
