Actions
  • shareshare
  • link
  • cite
  • add
add
auto_awesome_motion View all 3 versions
Publication . Article . 2020

Keeleandmete õigusliku režiimi mõju nende abil loodud keelemudelitele

Aleksei Kelli; Kadri Vider; Arvi Tavast; Krister Lindén; Ramūnas Birštonas; Penny Labropoulou; Age Värv; +3 Authors
Open Access
Estonian
Published: 01 Apr 2020
Country: Finland
Abstract

Artikli eesmärgiks on selgitada, millises ulatuses mõjutab keeleandmetele kohalduv õiguslik režiim keelemudelite arendamist ja kasutamist. Autorid lähtuvad oma käsitluses protsessiskeemist, alustades algandmetest ning lõpetades keeletehnoloogiat sisaldavate valmistoodetega (nt kõneliidesega külmik). Keeletehnoloogias kasutatavad algandmed sisaldavad tihti autoriõiguslikult kaitstavaid teoseid, autoriõigusega kaasnevate õiguste objekte (esitus, salvestus) ja isikuandmeid (isiku hääl, isiku kohta käiv muu info), mida säilitatakse annoteerimata ja annoteeritud andmekogudes. Keelandmete õiguslikke küsimusi on juba varem uuritud. Õiguslikult on läbi uurimata aga keelemudelite õiguslikud aspektid. Autorid on seisukohal, et reeglina ei mõjuta keelemudelite edasist õiguslikku staatust kasutatud algandmete õiguslik režiim, sest autoriõigusega kaitstavad teosed mudelis pigem ei säili. Küll aga võib õiguslikke probleeme tekitada isiku hääle kasutamine keelemudelis. Autorid analüüsivad võimalikke lahendusvariante nende probleemide ületamiseks. Artiklis vaadeldakse ka uue autoriõiguse direktiiviga kehtestatavat andmekaeve regulatsiooni ja selle rakendamist keelemudelite loomise kontekstis. *** "Influence of legal regime of language data on language models" This article aims to explain the extent to which the legal regime applicable to language data affects the development and use of language models. In their approach, the authors follow a process chart, starting from raw data to finished products containing language technology (eg a refrigerator with a speech interface). The raw data used in language technologies often include copyrighted works, objects of related rights (performances, sound recordings) and personal data (voice, other information about the person) stored in non-annotated and annotated databases. The legal issues of language data have already been studied. However, the legal aspects of language models have not been throughly explored. The authors are of the opinion that, as a rule, the legal status of the language models is not affect by the legal status of the used raw language data, since copyrighted works usually do not remain in the model. However, the use of a person’s voice in a language model can create legal problems. The authors analyze possible solutions to overcome these problems. The article also outlines the regulation of data mining introduced by the new copyright directive and its implementation in the context of development of language models.

Subjects by Vocabulary

ACM Computing Classification System: ComputingMilieux_LEGALASPECTSOFCOMPUTING

Microsoft Academic Graph classification: Language model Context (language use) Copyright Directive Linguistics Affect (linguistics) Legal status Language technology Computer science Raw data On Language

Library of Congress Subject Headings: lcsh:Philology. Linguistics lcsh:P1-1091 lcsh:Finnic. Baltic-Finnic lcsh:PH91-98.5

Subjects

6121 Languages, Linguistics and Language, Education, Language and Linguistics, copyright, personal data, language model, language technology, text and data mining

moresidebar