Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
111 Research products, page 1 of 12

  • Publications
  • Research data
  • Research software
  • Open Access
  • English
  • Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

10
arrow_drop_down
Relevance
arrow_drop_down
  • Open Access English
    Authors: 
    Jaka Čibej; Iza Škrjanec;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    Seminarja o metodah korpusnega in eksperimentalnega jezikoslovja v Beogradu in Zagrebu

  • Open Access English
    Authors: 
    Žiga Golob; Boštjan Vesnicer; Jerneja Žganec Gros; Mario Žganec; Simon Dobrišek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Computer models based on finite-state transducers are well suited for compact representations of pronunciation lexicons that are used both in speech synthesis as well as in speech recognition. In this paper, we present a finite-state super transducer, which is a new type of finite state transducer that allows the representation of a pronunciation lexicon with fewer states and transitions than using a conventional minimized and determinized finite-state transducer. A finite-state super transducer is a deterministic transducer that can, in addition to the words comprised in the pronunciation lexicon, accept some other, out-of-dictionary words as well. The resulting allophone transcription for these words can be erroneous, but we demonstrate that such errors are comparable to the performance of state-of-the-art methods for grapheme-to-phoneme conversion. The procedure for building finite-state super transducers and a validation of their performance is demonstrated on the SI-PRON pronunciation lexicon. In addition, we also analyze several properties of finite-state transducers with respect to their minimum size obtained by their determinization and minimization. We show that for highly inflected languages their minimum size begins to decrease when the number of words in the represented pronunciation dictionary reaches a certain threshold.

  • Open Access English
    Authors: 
    Polona Gantar;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    The Proposal for a Dictionary of Contemporary Slovene, published in May 2013, has stirred many debates in both academic circles and in media. The topic central to all the debates was whether a new dictionary of Slovene should follow the tradition established by the Dictionary of Literary Slovene (published in 1970s), which was based on the structuralist theories of the Prague school, or move away from this tradition. All this lead to differing views on what a dictionary tradition is, and on the role of new lexicographic methods. By analyzing the concepts of the Dictionary of Literary Slovene and the Dictionary of New Slovene Lexis (published in 2012), as well as making an overview of scientific articles dealing with the topic of a concept for a new dictionary of Slovene, this paper attempts to establish which elements of lexicographic theory can be viewed as traditional and which represent innovation in Slovene lexicography. Simultaneously, a concept for a new dictionary is considered from three perspectives: the user, the medium, and the use of language technologies, which would facilitate language description and meet the needs of language community. As the author argues, a new dictionary of Slovene will do well to carefully consider the status of literary language in contemporary Slovene, be corpus driven and user oriented (rather than academic), incorporate various lexicographic findings, e.g. use different approaches to defining (depending on their efficiency at different word classes or categories of words), be digital born, i.e. devised with an online medium in mind, offer updates on a regular basis, and utilize various language technologies, such as automatic example extraction, in its design to facilitate dictionary compilation. Only thus will the new dictionary become a state-of-the-art lexicographic product and a worthy successor to the Dictionary of Standard Slovene.

  • Open Access English
    Authors: 
    Jerica Snoj;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    V prispevku se obravnava besednovrstna kategorizacija s stališča navajanja besednovrstnih oznak v slovarskem opruiročniku za slovenski jezik na splošni ravni, veljavni ne glede na določeni slovarski koncept. Uvodoma je prikazana svojskost besednovrstnega kategoriziranja v slovnični teoriji, čemur sledi ponazoritev vloge besednovrstnega določanja leksikalnih enot znotraj slovarskega pomenskega opisa. Ob primerih za povedkovnik, členek in izdeležniške tvorjenke je prikazan razvoj besednovrstnega kategoriziranja v doslejšnjih slovenskih slovarjih in na osnovi tega so podane smernice za besednovrstno označevanje leksikalnih enot v prihodnjih slovenskih slovarskih priročnikih.

  • Publication . Article . 2013
    Open Access English
    Authors: 
    Iztok Kosem; Polona Gantar; Simon Krek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    A new approach to lexicographic work, in which the lexicographer is seen more as a validator of the choices made by computer, was recently envisaged by Rundell and Kilgarriff (2011). In this paper, we describe an experiment using such an approach during the creation of Slovene Lexical Database (Gantar, Krek, 2011). The corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, were automatically extracted from 1,18-billion-word Gigafida corpus of Slovene. The evaluation of the extracted data consisted of making a comparison between the time spent writing a manual entry and a (semi)-automatic entry, and identifying potential improvements in the extraction algorithm and in the presentation of data. An important finding was that the automatic approach was far more effective than the manual approach, without any significant loss of information. Based on our experience, we would propose a slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers. Such an approach indeed reduces the scope of lexicographer’s work, however it also results in the ability of bringing the content to the users more quickly.

  • Open Access English
    Authors: 
    Darja Fišer; Tomaž Erjavec; Ajda Pretnar;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
  • Open Access English
    Authors: 
    Darja Fišer;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    S širjenjem svetovnega spleta je prišlo do razmaha uporabniških spletnih vsebin, za katere je zaradi specifičnih družbenih in tehničnih okoliščin, v katerih tovrstna komunikacija poteka, značilna raba pogovornih in tujejezičnih izrazov, nestandardne ortografije in skladnje, specifičnih okrajšav in hiter dotok novega besedišča. Tovrstna komunikacija je zato izjemno zanimiv nov predmet raziskovanja za korpusne in računalniške jezikoslovce, pa tudi za digitalno humanistiko in družboslovje nasploh. Ker je to področje v zadnjih nekaj letih tudi pri nas zelo živahno in je že obrodilo sadove tako v smislu virov in orodij kot tudi raziskovalnih rezultatov, smo se odločili, da mu posvetimo tematsko številko revije Slovenščina 2.0, ki je pravkar pred vami.

  • Publication . Article . 2013
    Open Access English
    Authors: 
    Polona Gantar; Nataša Logar Breginc;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Digitalizirani jezikovni viri, procesiranje naravnega jezika, korpusne analize slovničnih in drugih jezikovnih pojavov, rudarjenje besedil, označevalniki, luščilniki, leksikografska orodja, sinteza govora, strojno prevajanje, avatarski sogovorci, pametne hiše ... Skupna točka: jezik.

  • Publication . Article . 2019
    Open Access English
    Authors: 
    Vojko Gorjanc; Špela Arhar Holdt;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Poseben tematski sklop letošnje prve številke vsebuje osem kratkih znanstvenih prispevkov, ki pregledno opisujejo trenutno stanje na področju leksikografije na Danskem, Švedskem, Norveškem, Hrvaškem, v Grčiji, Baskiji, Estoniji in Braziliji. Prispevki so nastali kot rezultat znanstvenega sodelovanja v evropski mreži ENeL – European Network of e-Lexicography [ISCH COST Action IS1305].

  • Open Access English
    Authors: 
    Damjan Popič;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
111 Research products, page 1 of 12
  • Open Access English
    Authors: 
    Jaka Čibej; Iza Škrjanec;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    Seminarja o metodah korpusnega in eksperimentalnega jezikoslovja v Beogradu in Zagrebu

  • Open Access English
    Authors: 
    Žiga Golob; Boštjan Vesnicer; Jerneja Žganec Gros; Mario Žganec; Simon Dobrišek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Computer models based on finite-state transducers are well suited for compact representations of pronunciation lexicons that are used both in speech synthesis as well as in speech recognition. In this paper, we present a finite-state super transducer, which is a new type of finite state transducer that allows the representation of a pronunciation lexicon with fewer states and transitions than using a conventional minimized and determinized finite-state transducer. A finite-state super transducer is a deterministic transducer that can, in addition to the words comprised in the pronunciation lexicon, accept some other, out-of-dictionary words as well. The resulting allophone transcription for these words can be erroneous, but we demonstrate that such errors are comparable to the performance of state-of-the-art methods for grapheme-to-phoneme conversion. The procedure for building finite-state super transducers and a validation of their performance is demonstrated on the SI-PRON pronunciation lexicon. In addition, we also analyze several properties of finite-state transducers with respect to their minimum size obtained by their determinization and minimization. We show that for highly inflected languages their minimum size begins to decrease when the number of words in the represented pronunciation dictionary reaches a certain threshold.

  • Open Access English
    Authors: 
    Polona Gantar;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    The Proposal for a Dictionary of Contemporary Slovene, published in May 2013, has stirred many debates in both academic circles and in media. The topic central to all the debates was whether a new dictionary of Slovene should follow the tradition established by the Dictionary of Literary Slovene (published in 1970s), which was based on the structuralist theories of the Prague school, or move away from this tradition. All this lead to differing views on what a dictionary tradition is, and on the role of new lexicographic methods. By analyzing the concepts of the Dictionary of Literary Slovene and the Dictionary of New Slovene Lexis (published in 2012), as well as making an overview of scientific articles dealing with the topic of a concept for a new dictionary of Slovene, this paper attempts to establish which elements of lexicographic theory can be viewed as traditional and which represent innovation in Slovene lexicography. Simultaneously, a concept for a new dictionary is considered from three perspectives: the user, the medium, and the use of language technologies, which would facilitate language description and meet the needs of language community. As the author argues, a new dictionary of Slovene will do well to carefully consider the status of literary language in contemporary Slovene, be corpus driven and user oriented (rather than academic), incorporate various lexicographic findings, e.g. use different approaches to defining (depending on their efficiency at different word classes or categories of words), be digital born, i.e. devised with an online medium in mind, offer updates on a regular basis, and utilize various language technologies, such as automatic example extraction, in its design to facilitate dictionary compilation. Only thus will the new dictionary become a state-of-the-art lexicographic product and a worthy successor to the Dictionary of Standard Slovene.

  • Open Access English
    Authors: 
    Jerica Snoj;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    V prispevku se obravnava besednovrstna kategorizacija s stališča navajanja besednovrstnih oznak v slovarskem opruiročniku za slovenski jezik na splošni ravni, veljavni ne glede na določeni slovarski koncept. Uvodoma je prikazana svojskost besednovrstnega kategoriziranja v slovnični teoriji, čemur sledi ponazoritev vloge besednovrstnega določanja leksikalnih enot znotraj slovarskega pomenskega opisa. Ob primerih za povedkovnik, členek in izdeležniške tvorjenke je prikazan razvoj besednovrstnega kategoriziranja v doslejšnjih slovenskih slovarjih in na osnovi tega so podane smernice za besednovrstno označevanje leksikalnih enot v prihodnjih slovenskih slovarskih priročnikih.

  • Publication . Article . 2013
    Open Access English
    Authors: 
    Iztok Kosem; Polona Gantar; Simon Krek;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    A new approach to lexicographic work, in which the lexicographer is seen more as a validator of the choices made by computer, was recently envisaged by Rundell and Kilgarriff (2011). In this paper, we describe an experiment using such an approach during the creation of Slovene Lexical Database (Gantar, Krek, 2011). The corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, were automatically extracted from 1,18-billion-word Gigafida corpus of Slovene. The evaluation of the extracted data consisted of making a comparison between the time spent writing a manual entry and a (semi)-automatic entry, and identifying potential improvements in the extraction algorithm and in the presentation of data. An important finding was that the automatic approach was far more effective than the manual approach, without any significant loss of information. Based on our experience, we would propose a slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers. Such an approach indeed reduces the scope of lexicographer’s work, however it also results in the ability of bringing the content to the users more quickly.

  • Open Access English
    Authors: 
    Darja Fišer; Tomaž Erjavec; Ajda Pretnar;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
  • Open Access English
    Authors: 
    Darja Fišer;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
    Country: Slovenia

    S širjenjem svetovnega spleta je prišlo do razmaha uporabniških spletnih vsebin, za katere je zaradi specifičnih družbenih in tehničnih okoliščin, v katerih tovrstna komunikacija poteka, značilna raba pogovornih in tujejezičnih izrazov, nestandardne ortografije in skladnje, specifičnih okrajšav in hiter dotok novega besedišča. Tovrstna komunikacija je zato izjemno zanimiv nov predmet raziskovanja za korpusne in računalniške jezikoslovce, pa tudi za digitalno humanistiko in družboslovje nasploh. Ker je to področje v zadnjih nekaj letih tudi pri nas zelo živahno in je že obrodilo sadove tako v smislu virov in orodij kot tudi raziskovalnih rezultatov, smo se odločili, da mu posvetimo tematsko številko revije Slovenščina 2.0, ki je pravkar pred vami.

  • Publication . Article . 2013
    Open Access English
    Authors: 
    Polona Gantar; Nataša Logar Breginc;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Digitalizirani jezikovni viri, procesiranje naravnega jezika, korpusne analize slovničnih in drugih jezikovnih pojavov, rudarjenje besedil, označevalniki, luščilniki, leksikografska orodja, sinteza govora, strojno prevajanje, avatarski sogovorci, pametne hiše ... Skupna točka: jezik.

  • Publication . Article . 2019
    Open Access English
    Authors: 
    Vojko Gorjanc; Špela Arhar Holdt;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)

    Poseben tematski sklop letošnje prve številke vsebuje osem kratkih znanstvenih prispevkov, ki pregledno opisujejo trenutno stanje na področju leksikografije na Danskem, Švedskem, Norveškem, Hrvaškem, v Grčiji, Baskiji, Estoniji in Braziliji. Prispevki so nastali kot rezultat znanstvenega sodelovanja v evropski mreži ENeL – European Network of e-Lexicography [ISCH COST Action IS1305].

  • Open Access English
    Authors: 
    Damjan Popič;
    Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Send a message
How can we help?
We usually respond in a few hours.