Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
105 Research products, page 1 of 11

  • 2013-2022
  • Article
  • English
  • Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

10
arrow_drop_down
Relevance
arrow_drop_down
  • Open Access English
    Authors: 
    Jaka Čibej; Iza Škrjanec;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    Seminarja o metodah korpusnega in eksperimentalnega jezikoslovja v Beogradu in Zagrebu

  • Open Access English
    Authors: 
    Polona Gantar;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    The Proposal for a Dictionary of Contemporary Slovene, published in May 2013, has stirred many debates in both academic circles and in media. The topic central to all the debates was whether a new dictionary of Slovene should follow the tradition established by the Dictionary of Literary Slovene (published in 1970s), which was based on the structuralist theories of the Prague school, or move away from this tradition. All this lead to differing views on what a dictionary tradition is, and on the role of new lexicographic methods. By analyzing the concepts of the Dictionary of Literary Slovene and the Dictionary of New Slovene Lexis (published in 2012), as well as making an overview of scientific articles dealing with the topic of a concept for a new dictionary of Slovene, this paper attempts to establish which elements of lexicographic theory can be viewed as traditional and which represent innovation in Slovene lexicography. Simultaneously, a concept for a new dictionary is considered from three perspectives: the user, the medium, and the use of language technologies, which would facilitate language description and meet the needs of language community. As the author argues, a new dictionary of Slovene will do well to carefully consider the status of literary language in contemporary Slovene, be corpus driven and user oriented (rather than academic), incorporate various lexicographic findings, e.g. use different approaches to defining (depending on their efficiency at different word classes or categories of words), be digital born, i.e. devised with an online medium in mind, offer updates on a regular basis, and utilize various language technologies, such as automatic example extraction, in its design to facilitate dictionary compilation. Only thus will the new dictionary become a state-of-the-art lexicographic product and a worthy successor to the Dictionary of Standard Slovene.

  • Open Access English
    Authors: 
    Žiga Golob; Boštjan Vesnicer; Jerneja Žganec Gros; Mario Žganec; Simon Dobrišek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Computer models based on finite-state transducers are well suited for compact representations of pronunciation lexicons that are used both in speech synthesis as well as in speech recognition. In this paper, we present a finite-state super transducer, which is a new type of finite state transducer that allows the representation of a pronunciation lexicon with fewer states and transitions than using a conventional minimized and determinized finite-state transducer. A finite-state super transducer is a deterministic transducer that can, in addition to the words comprised in the pronunciation lexicon, accept some other, out-of-dictionary words as well. The resulting allophone transcription for these words can be erroneous, but we demonstrate that such errors are comparable to the performance of state-of-the-art methods for grapheme-to-phoneme conversion. The procedure for building finite-state super transducers and a validation of their performance is demonstrated on the SI-PRON pronunciation lexicon. In addition, we also analyze several properties of finite-state transducers with respect to their minimum size obtained by their determinization and minimization. We show that for highly inflected languages their minimum size begins to decrease when the number of words in the represented pronunciation dictionary reaches a certain threshold.

  • Open Access English
    Authors: 
    Darinka Verdonik; Simona Majhenič; Špela Antloga; Sandi Majninger; Marko Ferme; Kaja Dobrovoljc; Simona Pulko; Mira Krajnc Ivič; Natalija Ulčnik;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    The paper describes three types of challenges that were detected in teaching Slovene as a mother tongue at schools. First, a number of orthographic and grammatic mistakes can be detected in pupils’ writings (see Kosem et al., 2012; Križaj in Bester Turk, 2018; Gomboc, 2019). Second, low phraseological literacy was noticed and the pupils often have problems understanding phrasemes (Vorsic, 2018). Third, the challenges of communicative competence were addressed, referring to production and interpretation of different written, spoken as well as multimedia genres, as only appropriate genre literacy enables efficient use of different genres (Nidorfer Siskovic, 2013). To address these challenges, we have developed a complex e-learning environment for improving writing and communication skills of Slovene pupils – “Slovenscina na dlani”. The developed environment is divided into four general topics – orthography, grammar, phrasemes and texts. Each topic covers a number of subtopics, and for each sub-topic a number of exercises is available, along with explanations. We have used the most up-to-date language technologies and programming solutions in order to automatise the e-environment. The user’s knowledge is automatically evaluated, and based on this s/he is automatically guided through the environment in a way to improve her/his writing and communication skills. The e-environment has also a special user interface for teachers which enables easy way to assign tasks as well as to track the performance of each pupil individually or a group of pupils as a whole. The gamification and professional graphic design fulfil the user experience. The “Slovenscina na dlani” will be freely available at https://slo-na-dlani.si from September 2021 on.

  • Open Access English
    Authors: 
    Toma Tasovac; Ana Salgado; Rute Costa;
    Publisher: Zenodo

    The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.

  • Open Access English
    Authors: 
    Darja Fišer;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    Od 25. do 27. novembra 2015 je v dvorani GIAM ZRC SAZU v Ljubljani potekala znanstvena konferenca Slovenščina na spletu in v novih medijih. Konferenco so v okviru temeljnega raziskovalnega projekta JANES, ki ga med letoma 2014 in 2017 financira Javna agencija za raziskovalno dejavnost Republike Slovenije, soorganizirali Filozofska fakulteta Univerze v Ljubljani, Slovensko društvo za jezikovne tehnologije, slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije CLARIN.SI in regionalna iniciativa za jezikovne podatke RelDI. Prvi dan konference je bil namenjen celodnevnemu seminarju iz statistike za jezikoslovce, ki ga je vodila doc. dr. Maja Miličević z Univerze v Beogradu. 25 udeležencev se je seznanilo z osnovami kvantitativnih metod v korpusnem jezikoslovju, opisno in inferenčno statistiko, prav tako pa tudi z načini vizualizacije jezikovnih podatkov in programskega paketa R. Gradivo s seminarja je dostopno na konferenčni spletni strani.

  • Publication . Article . 2013
    Open Access English
    Authors: 
    Iztok Kosem; Polona Gantar; Simon Krek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    A new approach to lexicographic work, in which the lexicographer is seen more as a validator of the choices made by computer, was recently envisaged by Rundell and Kilgarriff (2011). In this paper, we describe an experiment using such an approach during the creation of Slovene Lexical Database (Gantar, Krek, 2011). The corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, were automatically extracted from 1,18-billion-word Gigafida corpus of Slovene. The evaluation of the extracted data consisted of making a comparison between the time spent writing a manual entry and a (semi)-automatic entry, and identifying potential improvements in the extraction algorithm and in the presentation of data. An important finding was that the automatic approach was far more effective than the manual approach, without any significant loss of information. Based on our experience, we would propose a slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers. Such an approach indeed reduces the scope of lexicographer’s work, however it also results in the ability of bringing the content to the users more quickly.

  • Open Access English
    Authors: 
    Oddrun Grønvik; Sturla Berg-Olsen; Marit Hovdenak; Knut E. Karlsen;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Monolingual lexicography for Norwegian started some decades after political independence from Denmark in 1814. Since 1885 two written standards have been recognized, one based on Danish as spoken in Norway (today Bokmål), and one based on the Norwegian vernacular (Nynorsk). Both are fully described in major scholarly dictionaries, now completed and freely available on the web. Both receive some public funding, with a view to further development. Because of frequent orthographic revisions, at first aimed at bringing the written standards closer to each other, spellers dominated the market through most of the 20th century. Today linguistic stability is aimed for, incorporating only such changes in the written standards as are supported by general usage. The first general monolingual defining dictionaries Bokmålsordboka and Nynorskordboka, covering the central vocabulary of each written standard, were first published as parallel volumes in 1986, and are now undergoing revision at the University of Bergen in cooperation with the Language Council of Norway. These dictionaries are now stored in databases, are available on the web and as a free smartphone app. Public funding of monolingual mother tongue lexicography is seen as an investment in essential linguistic infrastructure, as is bilingual lexicography between the Nordic languages and Norwegian, while other bilingual lexicography is dealt with by private publishers.

  • Open Access English
    Authors: 
    Darja Fišer;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    S širjenjem svetovnega spleta je prišlo do razmaha uporabniških spletnih vsebin, za katere je zaradi specifičnih družbenih in tehničnih okoliščin, v katerih tovrstna komunikacija poteka, značilna raba pogovornih in tujejezičnih izrazov, nestandardne ortografije in skladnje, specifičnih okrajšav in hiter dotok novega besedišča. Tovrstna komunikacija je zato izjemno zanimiv nov predmet raziskovanja za korpusne in računalniške jezikoslovce, pa tudi za digitalno humanistiko in družboslovje nasploh. Ker je to področje v zadnjih nekaj letih tudi pri nas zelo živahno in je že obrodilo sadove tako v smislu virov in orodij kot tudi raziskovalnih rezultatov, smo se odločili, da mu posvetimo tematsko številko revije Slovenščina 2.0, ki je pravkar pred vami.

  • Open Access English
    Authors: 
    Lars Trap-Jensen; Henrik Lorentzen; Nicolai Hartvig Sørensen;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    rispevek se osredotoča na preučitev razmerja med dnevniki iskanj uporabnikov po spletnem slovarju in korpusno pogostostjo besed. Študijo so spodbudila razmišljanja, ki so se porajala pri rednem slovarskem delu in jih lahko strnemo v vprašanje: kako ohranjati na korpusu temelječ slovar aktualen? Bi morala biti naslednja beseda, ki jo uvrstimo v slovar, tista, ki sledi zadnji uslovarjeni besedi na frekvenčnem seznamu besed iz korpusa? Ali bi morala biti to beseda, ki jo uporabniki najpogosteje neuspešno iščejo v slovarju? Da bi prišli do ustreznih kriterijev, so avtorji analizirali dnevnike iskanj uporabnikov danskega slovarja v obdobju od 2009 do 2012 in seznam najpogosteje iskanih besed primerjali z njihovo pogostostjo v korpusu. S proučitvijo iskalnih navad uporabnikov so avtorji želeli priti do odgovorov na sledeča vprašanja: Ali so v slovarju besede, ki jih uporabniki nikoli ne iščejo? Če je odgovor da, ali lahko na podlagi njihove pogostosti v korpusu opazimo kakšne smiselne vzorce – gre za besede iste besedne vrste, so besede zelo pogoste ali zelo redke, se pojavljajo v določenem frekvenčnem območju? Ugotovitev prispevka je, da je pogostost v korpusu dober kriterij za 20.000 najpogostejših iztočnic, medtem ko je treba pri manj pogostih besedah dodati še druge metode, med katerimi je tudi pregled iskanj uporabnikov, nadvse pomembna pa je tudi presoja leksikografov.

Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
105 Research products, page 1 of 11
  • Open Access English
    Authors: 
    Jaka Čibej; Iza Škrjanec;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    Seminarja o metodah korpusnega in eksperimentalnega jezikoslovja v Beogradu in Zagrebu

  • Open Access English
    Authors: 
    Polona Gantar;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    The Proposal for a Dictionary of Contemporary Slovene, published in May 2013, has stirred many debates in both academic circles and in media. The topic central to all the debates was whether a new dictionary of Slovene should follow the tradition established by the Dictionary of Literary Slovene (published in 1970s), which was based on the structuralist theories of the Prague school, or move away from this tradition. All this lead to differing views on what a dictionary tradition is, and on the role of new lexicographic methods. By analyzing the concepts of the Dictionary of Literary Slovene and the Dictionary of New Slovene Lexis (published in 2012), as well as making an overview of scientific articles dealing with the topic of a concept for a new dictionary of Slovene, this paper attempts to establish which elements of lexicographic theory can be viewed as traditional and which represent innovation in Slovene lexicography. Simultaneously, a concept for a new dictionary is considered from three perspectives: the user, the medium, and the use of language technologies, which would facilitate language description and meet the needs of language community. As the author argues, a new dictionary of Slovene will do well to carefully consider the status of literary language in contemporary Slovene, be corpus driven and user oriented (rather than academic), incorporate various lexicographic findings, e.g. use different approaches to defining (depending on their efficiency at different word classes or categories of words), be digital born, i.e. devised with an online medium in mind, offer updates on a regular basis, and utilize various language technologies, such as automatic example extraction, in its design to facilitate dictionary compilation. Only thus will the new dictionary become a state-of-the-art lexicographic product and a worthy successor to the Dictionary of Standard Slovene.

  • Open Access English
    Authors: 
    Žiga Golob; Boštjan Vesnicer; Jerneja Žganec Gros; Mario Žganec; Simon Dobrišek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Computer models based on finite-state transducers are well suited for compact representations of pronunciation lexicons that are used both in speech synthesis as well as in speech recognition. In this paper, we present a finite-state super transducer, which is a new type of finite state transducer that allows the representation of a pronunciation lexicon with fewer states and transitions than using a conventional minimized and determinized finite-state transducer. A finite-state super transducer is a deterministic transducer that can, in addition to the words comprised in the pronunciation lexicon, accept some other, out-of-dictionary words as well. The resulting allophone transcription for these words can be erroneous, but we demonstrate that such errors are comparable to the performance of state-of-the-art methods for grapheme-to-phoneme conversion. The procedure for building finite-state super transducers and a validation of their performance is demonstrated on the SI-PRON pronunciation lexicon. In addition, we also analyze several properties of finite-state transducers with respect to their minimum size obtained by their determinization and minimization. We show that for highly inflected languages their minimum size begins to decrease when the number of words in the represented pronunciation dictionary reaches a certain threshold.

  • Open Access English
    Authors: 
    Darinka Verdonik; Simona Majhenič; Špela Antloga; Sandi Majninger; Marko Ferme; Kaja Dobrovoljc; Simona Pulko; Mira Krajnc Ivič; Natalija Ulčnik;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    The paper describes three types of challenges that were detected in teaching Slovene as a mother tongue at schools. First, a number of orthographic and grammatic mistakes can be detected in pupils’ writings (see Kosem et al., 2012; Križaj in Bester Turk, 2018; Gomboc, 2019). Second, low phraseological literacy was noticed and the pupils often have problems understanding phrasemes (Vorsic, 2018). Third, the challenges of communicative competence were addressed, referring to production and interpretation of different written, spoken as well as multimedia genres, as only appropriate genre literacy enables efficient use of different genres (Nidorfer Siskovic, 2013). To address these challenges, we have developed a complex e-learning environment for improving writing and communication skills of Slovene pupils – “Slovenscina na dlani”. The developed environment is divided into four general topics – orthography, grammar, phrasemes and texts. Each topic covers a number of subtopics, and for each sub-topic a number of exercises is available, along with explanations. We have used the most up-to-date language technologies and programming solutions in order to automatise the e-environment. The user’s knowledge is automatically evaluated, and based on this s/he is automatically guided through the environment in a way to improve her/his writing and communication skills. The e-environment has also a special user interface for teachers which enables easy way to assign tasks as well as to track the performance of each pupil individually or a group of pupils as a whole. The gamification and professional graphic design fulfil the user experience. The “Slovenscina na dlani” will be freely available at https://slo-na-dlani.si from September 2021 on.

  • Open Access English
    Authors: 
    Toma Tasovac; Ana Salgado; Rute Costa;
    Publisher: Zenodo

    The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.

  • Open Access English
    Authors: 
    Darja Fišer;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    Od 25. do 27. novembra 2015 je v dvorani GIAM ZRC SAZU v Ljubljani potekala znanstvena konferenca Slovenščina na spletu in v novih medijih. Konferenco so v okviru temeljnega raziskovalnega projekta JANES, ki ga med letoma 2014 in 2017 financira Javna agencija za raziskovalno dejavnost Republike Slovenije, soorganizirali Filozofska fakulteta Univerze v Ljubljani, Slovensko društvo za jezikovne tehnologije, slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije CLARIN.SI in regionalna iniciativa za jezikovne podatke RelDI. Prvi dan konference je bil namenjen celodnevnemu seminarju iz statistike za jezikoslovce, ki ga je vodila doc. dr. Maja Miličević z Univerze v Beogradu. 25 udeležencev se je seznanilo z osnovami kvantitativnih metod v korpusnem jezikoslovju, opisno in inferenčno statistiko, prav tako pa tudi z načini vizualizacije jezikovnih podatkov in programskega paketa R. Gradivo s seminarja je dostopno na konferenčni spletni strani.

  • Publication . Article . 2013
    Open Access English
    Authors: 
    Iztok Kosem; Polona Gantar; Simon Krek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    A new approach to lexicographic work, in which the lexicographer is seen more as a validator of the choices made by computer, was recently envisaged by Rundell and Kilgarriff (2011). In this paper, we describe an experiment using such an approach during the creation of Slovene Lexical Database (Gantar, Krek, 2011). The corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, were automatically extracted from 1,18-billion-word Gigafida corpus of Slovene. The evaluation of the extracted data consisted of making a comparison between the time spent writing a manual entry and a (semi)-automatic entry, and identifying potential improvements in the extraction algorithm and in the presentation of data. An important finding was that the automatic approach was far more effective than the manual approach, without any significant loss of information. Based on our experience, we would propose a slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers. Such an approach indeed reduces the scope of lexicographer’s work, however it also results in the ability of bringing the content to the users more quickly.

  • Open Access English
    Authors: 
    Oddrun Grønvik; Sturla Berg-Olsen; Marit Hovdenak; Knut E. Karlsen;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Monolingual lexicography for Norwegian started some decades after political independence from Denmark in 1814. Since 1885 two written standards have been recognized, one based on Danish as spoken in Norway (today Bokmål), and one based on the Norwegian vernacular (Nynorsk). Both are fully described in major scholarly dictionaries, now completed and freely available on the web. Both receive some public funding, with a view to further development. Because of frequent orthographic revisions, at first aimed at bringing the written standards closer to each other, spellers dominated the market through most of the 20th century. Today linguistic stability is aimed for, incorporating only such changes in the written standards as are supported by general usage. The first general monolingual defining dictionaries Bokmålsordboka and Nynorskordboka, covering the central vocabulary of each written standard, were first published as parallel volumes in 1986, and are now undergoing revision at the University of Bergen in cooperation with the Language Council of Norway. These dictionaries are now stored in databases, are available on the web and as a free smartphone app. Public funding of monolingual mother tongue lexicography is seen as an investment in essential linguistic infrastructure, as is bilingual lexicography between the Nordic languages and Norwegian, while other bilingual lexicography is dealt with by private publishers.

  • Open Access English
    Authors: 
    Darja Fišer;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    S širjenjem svetovnega spleta je prišlo do razmaha uporabniških spletnih vsebin, za katere je zaradi specifičnih družbenih in tehničnih okoliščin, v katerih tovrstna komunikacija poteka, značilna raba pogovornih in tujejezičnih izrazov, nestandardne ortografije in skladnje, specifičnih okrajšav in hiter dotok novega besedišča. Tovrstna komunikacija je zato izjemno zanimiv nov predmet raziskovanja za korpusne in računalniške jezikoslovce, pa tudi za digitalno humanistiko in družboslovje nasploh. Ker je to področje v zadnjih nekaj letih tudi pri nas zelo živahno in je že obrodilo sadove tako v smislu virov in orodij kot tudi raziskovalnih rezultatov, smo se odločili, da mu posvetimo tematsko številko revije Slovenščina 2.0, ki je pravkar pred vami.

  • Open Access English
    Authors: 
    Lars Trap-Jensen; Henrik Lorentzen; Nicolai Hartvig Sørensen;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    rispevek se osredotoča na preučitev razmerja med dnevniki iskanj uporabnikov po spletnem slovarju in korpusno pogostostjo besed. Študijo so spodbudila razmišljanja, ki so se porajala pri rednem slovarskem delu in jih lahko strnemo v vprašanje: kako ohranjati na korpusu temelječ slovar aktualen? Bi morala biti naslednja beseda, ki jo uvrstimo v slovar, tista, ki sledi zadnji uslovarjeni besedi na frekvenčnem seznamu besed iz korpusa? Ali bi morala biti to beseda, ki jo uporabniki najpogosteje neuspešno iščejo v slovarju? Da bi prišli do ustreznih kriterijev, so avtorji analizirali dnevnike iskanj uporabnikov danskega slovarja v obdobju od 2009 do 2012 in seznam najpogosteje iskanih besed primerjali z njihovo pogostostjo v korpusu. S proučitvijo iskalnih navad uporabnikov so avtorji želeli priti do odgovorov na sledeča vprašanja: Ali so v slovarju besede, ki jih uporabniki nikoli ne iščejo? Če je odgovor da, ali lahko na podlagi njihove pogostosti v korpusu opazimo kakšne smiselne vzorce – gre za besede iste besedne vrste, so besede zelo pogoste ali zelo redke, se pojavljajo v določenem frekvenčnem območju? Ugotovitev prispevka je, da je pogostost v korpusu dober kriterij za 20.000 najpogostejših iztočnic, medtem ko je treba pri manj pogostih besedah dodati še druge metode, med katerimi je tudi pregled iskanj uporabnikov, nadvse pomembna pa je tudi presoja leksikografov.

Send a message
How can we help?
We usually respond in a few hours.