Powered by OpenAIRE graph
Found an issue? Give us feedback

Выровненный казахско-русский параллельный корпус, ориентированный на криминальную тематику

Выровненный казахско-русский параллельный корпус, ориентированный на криминальную тематику

Abstract

В настоящее время разработка высококачественных параллельных текстовых корпусов является одним из наиболее актуальных и передовых направлений современной лингвистики. Особое внимание уделяется созданию параллельных многоязычных корпусов для языков с низким уровнем ресурсов, таких как казахский язык. В ходе исследования мы исследовали тексты с четырех казахских двуязычных новостных сайтов и создали параллельный казахско-русский корпус текстов, в основе которых лежит криминальная тематика. Для выравнивания корпуса мы использовали набор лексических соответствий и значения POS-тегов обоих языков. 60% наших корпусных предложений автоматически выровнены правильно. Наконец, мы проанализировали факторы, влияющие на процент ошибок. Currently, the development of high-quality parallel textual cases is one of the most relevant and advanced areas of modern linguistics. Particular attention is paid to the creation of parallel multilingual bodies for languages with a low level of resources, such as the Kazakh language. In the course of the study, we examined texts from four Kazakh bilingual news sites and created a parallel Kazakh-Russian corpus of texts based on criminal topics. To align the body, we used a set of lexical correspondences and the meaning of POS tags in both languages. 60% of our package offers are automatically aligned correctly. Finally, we analyzed the factors that influence the percentage of errors.

Related Organizations
Keywords

Kazakh-Russian parallel corpus, POS-тегирование, POS-tagging, лексические соответствия, lexical correspondences, criminal topics, компьютерная лингвистика, синтаксический анализ

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green