105 Research products, page 1 of 11
Loading
- Publication . Article . 2021Open Access EnglishAuthors:Maja Bitenc; Marko Stabej; Nataša Gliha Komac; Matejka Grgič; Monika Kalin Golob; Karmen Kenda-Jež; Albina Nećak Lük; Sonja Novak Lukanovič; Krištof Savski;Maja Bitenc; Marko Stabej; Nataša Gliha Komac; Matejka Grgič; Monika Kalin Golob; Karmen Kenda-Jež; Albina Nećak Lük; Sonja Novak Lukanovič; Krištof Savski;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Zapis posveta o aktualnih sociolingvističnih izzivih in prednostnih raziskovalnih tematikah, ki sta ga organizirala doc. dr. Maja Bitenc in red. prof. dr. Marko Stabej z Oddelka za slovenistiko in je potekal v ponedeljek, 27. 9. 2021, na Filozofski fakulteti Univerze v Ljubljani in s prenosom preko Zooma. V prvem delu so vabljene strokovnjakinje in strokovnjaki predstavili svoje poglede ob izhodiščnih vprašanjih, v drugem je sledila razprava vseh sodelujočih. Zapis posnetka so govornice in govorniki uredili po lastni presoji, načeloma s čim manj intervencijami, iz razprave pa so za branje prilagojene in objavljene vsebinsko tehtnejše replike.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Mojca Stritar Kučuk;Mojca Stritar Kučuk;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Redno vpisani tuji študenti Univerze v Ljubljani, ki se v prvem letu študija v okviru modula Leto plus učijo slovensko, se v drugem semestru na posebni delavnici podrobneje spoznajo s spletnimi jezikovnimi viri in tehnologijami za slovenščino. V prispevku je opisana izvedba te delavnice v študijskem letu 2019/20, ko je zaradi pandemije koronavirusa potekala na daljavo, v obliki interaktivnih videoposnetkov z nalogami za preverjanje razumevanja snovi. Drugi del prispevka se osredotoča na mnenje študentov o tovrstnih jezikovnih virih. S spletno anketo sem analizirala stališča in izkušnje študentov dveh generacij: študenti generacije 2018/19 so spletna orodja spoznavali v razredu, študenti generacije 2019/20 pa na daljavo. Sodeč po rezultatih ankete, mlajša generacija študentov jezikovne vire na spletu uporablja pogosteje. Študenti obeh skupin najpogosteje uporabljajo Googlov Prevajalnik, ki mu sledijo Sloleks, pregibnik Besana, Fran in Pons. Kot argumente za uporabo teh virov izpostavljajo predvsem hitrost oz. enostavnost uporabe in navajenost na določen vir.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Magdalena Gapsa;Magdalena Gapsa;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Poročilo o dveh pomembnih leksikografskih konferencah, in sicer o sedmi bienalni konferenci združenja Electronic lexicography in the 21st century (na kratko: eLex), ki je potekala med 5. in 7. julijem 2021, ter devetnajsti bienalni konferenci Evropskega leksikografskega združenja (European Association for Lexicography, EURALEX), ki je potekala med 7. in 9. septembrom 2021.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Darinka Verdonik; Simona Majhenič; Špela Antloga; Sandi Majninger; Marko Ferme; Kaja Dobrovoljc; Simona Pulko; Mira Krajnc Ivič; Natalija Ulčnik;Darinka Verdonik; Simona Majhenič; Špela Antloga; Sandi Majninger; Marko Ferme; Kaja Dobrovoljc; Simona Pulko; Mira Krajnc Ivič; Natalija Ulčnik;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
The paper describes three types of challenges that were detected in teaching Slovene as a mother tongue at schools. First, a number of orthographic and grammatic mistakes can be detected in pupils’ writings (see Kosem et al., 2012; Križaj in Bester Turk, 2018; Gomboc, 2019). Second, low phraseological literacy was noticed and the pupils often have problems understanding phrasemes (Vorsic, 2018). Third, the challenges of communicative competence were addressed, referring to production and interpretation of different written, spoken as well as multimedia genres, as only appropriate genre literacy enables efficient use of different genres (Nidorfer Siskovic, 2013). To address these challenges, we have developed a complex e-learning environment for improving writing and communication skills of Slovene pupils – “Slovenscina na dlani”. The developed environment is divided into four general topics – orthography, grammar, phrasemes and texts. Each topic covers a number of subtopics, and for each sub-topic a number of exercises is available, along with explanations. We have used the most up-to-date language technologies and programming solutions in order to automatise the e-environment. The user’s knowledge is automatically evaluated, and based on this s/he is automatically guided through the environment in a way to improve her/his writing and communication skills. The e-environment has also a special user interface for teachers which enables easy way to assign tasks as well as to track the performance of each pupil individually or a group of pupils as a whole. The gamification and professional graphic design fulfil the user experience. The “Slovenscina na dlani” will be freely available at https://slo-na-dlani.si from September 2021 on.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Lucija Gril; Mirjam Sepesy Maučec; Gregor Donaj; Andrej Žgank;Lucija Gril; Mirjam Sepesy Maučec; Gregor Donaj; Andrej Žgank;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . Preprint . 2021Open Access EnglishAuthors:Lucia Vlášková; Hana Strachoňová;Lucia Vlášková; Hana Strachoňová;Publisher: Unpublished
As a growing field of study within sign language linguistics, sign language lexicography faces many challenges that have already been answered for audio-oral language material. In this paper, we present some of these challenges and methods developed to help navigate the complex lexical classification field. The described methods and strategies are implemented in the first Czech sign language (ČZJ) online dictionary, a part of the platform Dictio, developed at Masaryk University in Brno. We cover the topic of lemmatisation and how to decide what constitutes a lexeme in sign language. We introduce four types of expressions that qualify for a dictionary entry: a simple lexeme, a compound, a derivative, and a set phrase. We address the question of the place of classifier constructions and shape and size specifiers in a dictionary, given their peculiar semantic status. We maintain the standard classification of classifiers (whole entity and holding classifiers) and size and shape specifiers (SASSes; static and tracing specifiers). We provide arguments for separating the category of specifiers from the category of classifiers. We discuss the proper treatment of mouthings and mouth gestures concerning citation forms, derivation and translation. We show why it is difficult in sign language to distinguish synonyms from variants and how our proposed phonological criteria can help. We explain how to construct a semantic definition in a sign language and what is the solution for multiple meanings of one form. We offer simple guidelines for forming proper examples of use in a sign language. And finally, we briefly comment on the process of the translation between sign and spoken languages. We conclude the paper with a summary of roles that Dictio plays in the ČZJ-signing community.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Katja Meden; Ana Cvek;Katja Meden; Ana Cvek;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
The fields of humanities and social sciences are often deprived of inclusion within the international citation indexes such as Scopus and Web of Science (WOS). The reason for this offshift in the indexes are commonly associated with the format of published works, e.g. the most common type of published works in humanities are monographs (though the scientific journals are on the rise), which are not typically included in WOS and Scopus. Even though Scopus is far more inclusive of such types and fields in comparison to WOS, there is still a gap to be filled. As a response to this predicament the Institute of Contemporary History developed its own citation index – the Historiography Citation Index (HCI), which was first meant to only track the research production within the institution, but has since been expanded to cover the production of the whole field of Slovene historiography. Over the years HCI was a subject of several upgrades and data harmonization attempts. Even with the upgrades, several shortcomings of the systems were apparent, and therefore, another upgrade was taken into consideration, and after the extensive analysis was performed, we identified the most problematic aspects of the index and began working on another upgrade. The upgrade was performed in two parts – in the first one, we took upon ourselves to improve the administrative system in which we implemented the ElasticSearch technology to improve our search engine and filtration system, as well as improving the data masks to increase the precision and accuracy of the data input into the index. As a part of the administrative system upgrade we also modeled the MODS application profile to increase the interoperability of our data and therefore, enabling the exchange of our data between different information systems without losing data and its context. In the second part, we upgraded the user interface of the citation index to be more user friendly. In order to increase the coherence of the data display, we implemented a table-like design of the search result, equipped with filters in each column. To increase the visibility of the most important factor of the citation index, number of citations the work has received, we included additional column just for that information. The index aims to enable researchers access to the information on the number of citations, cited works ect. It is also recognised by the Slovenian Research Agency (ARRS) as a valid source of citations and could be used to provide proof of the researchers achievements and scientific excellency, though it is still not recognised as equal to the SICRIS information system. With the upgrade we increased the efficiency of the citation index, as well as its usability, and with it ensured a more intuitive system to its indexators and users.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Matej Ulčar; Anka Supej; Marko Robnik-Šikonja; Senja Pollak;Matej Ulčar; Anka Supej; Marko Robnik-Šikonja; Senja Pollak;Publisher: ZenodoProject: EC | EMBEDDIA (825153)
In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Toma Tasovac; Ana Salgado; Rute Costa;Toma Tasovac; Ana Salgado; Rute Costa;Publisher: Zenodo
The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2020Open Access EnglishAuthors:Ina Ferbežar; Igor Cetina; Alojz Ihan; Marko Stabej; Lana Zdravković; Tina Zupančič;Ina Ferbežar; Igor Cetina; Alojz Ihan; Marko Stabej; Lana Zdravković; Tina Zupančič;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
The 54th ALTE (Association of Language Testers in Europe) meeting and public consultation took place in Ljubljana between 6th and 8th November 2019. The meeting titled Monolingual testing in multilingual reality: Language ideologies and their impact on language testing was organized by the University of Ljubljana, the Faculty of Arts, and the Center for Slovene as a Second and Foreign Language at the Department of Slovene Studies. In this context, a round table (Close) encounters of language policy makers was held on 8th November 2019. We are publishing the transcription of the discussion between the participants of the event.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.
105 Research products, page 1 of 11
Loading
- Publication . Article . 2021Open Access EnglishAuthors:Maja Bitenc; Marko Stabej; Nataša Gliha Komac; Matejka Grgič; Monika Kalin Golob; Karmen Kenda-Jež; Albina Nećak Lük; Sonja Novak Lukanovič; Krištof Savski;Maja Bitenc; Marko Stabej; Nataša Gliha Komac; Matejka Grgič; Monika Kalin Golob; Karmen Kenda-Jež; Albina Nećak Lük; Sonja Novak Lukanovič; Krištof Savski;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Zapis posveta o aktualnih sociolingvističnih izzivih in prednostnih raziskovalnih tematikah, ki sta ga organizirala doc. dr. Maja Bitenc in red. prof. dr. Marko Stabej z Oddelka za slovenistiko in je potekal v ponedeljek, 27. 9. 2021, na Filozofski fakulteti Univerze v Ljubljani in s prenosom preko Zooma. V prvem delu so vabljene strokovnjakinje in strokovnjaki predstavili svoje poglede ob izhodiščnih vprašanjih, v drugem je sledila razprava vseh sodelujočih. Zapis posnetka so govornice in govorniki uredili po lastni presoji, načeloma s čim manj intervencijami, iz razprave pa so za branje prilagojene in objavljene vsebinsko tehtnejše replike.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Mojca Stritar Kučuk;Mojca Stritar Kučuk;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Redno vpisani tuji študenti Univerze v Ljubljani, ki se v prvem letu študija v okviru modula Leto plus učijo slovensko, se v drugem semestru na posebni delavnici podrobneje spoznajo s spletnimi jezikovnimi viri in tehnologijami za slovenščino. V prispevku je opisana izvedba te delavnice v študijskem letu 2019/20, ko je zaradi pandemije koronavirusa potekala na daljavo, v obliki interaktivnih videoposnetkov z nalogami za preverjanje razumevanja snovi. Drugi del prispevka se osredotoča na mnenje študentov o tovrstnih jezikovnih virih. S spletno anketo sem analizirala stališča in izkušnje študentov dveh generacij: študenti generacije 2018/19 so spletna orodja spoznavali v razredu, študenti generacije 2019/20 pa na daljavo. Sodeč po rezultatih ankete, mlajša generacija študentov jezikovne vire na spletu uporablja pogosteje. Študenti obeh skupin najpogosteje uporabljajo Googlov Prevajalnik, ki mu sledijo Sloleks, pregibnik Besana, Fran in Pons. Kot argumente za uporabo teh virov izpostavljajo predvsem hitrost oz. enostavnost uporabe in navajenost na določen vir.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Magdalena Gapsa;Magdalena Gapsa;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Poročilo o dveh pomembnih leksikografskih konferencah, in sicer o sedmi bienalni konferenci združenja Electronic lexicography in the 21st century (na kratko: eLex), ki je potekala med 5. in 7. julijem 2021, ter devetnajsti bienalni konferenci Evropskega leksikografskega združenja (European Association for Lexicography, EURALEX), ki je potekala med 7. in 9. septembrom 2021.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Darinka Verdonik; Simona Majhenič; Špela Antloga; Sandi Majninger; Marko Ferme; Kaja Dobrovoljc; Simona Pulko; Mira Krajnc Ivič; Natalija Ulčnik;Darinka Verdonik; Simona Majhenič; Špela Antloga; Sandi Majninger; Marko Ferme; Kaja Dobrovoljc; Simona Pulko; Mira Krajnc Ivič; Natalija Ulčnik;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
The paper describes three types of challenges that were detected in teaching Slovene as a mother tongue at schools. First, a number of orthographic and grammatic mistakes can be detected in pupils’ writings (see Kosem et al., 2012; Križaj in Bester Turk, 2018; Gomboc, 2019). Second, low phraseological literacy was noticed and the pupils often have problems understanding phrasemes (Vorsic, 2018). Third, the challenges of communicative competence were addressed, referring to production and interpretation of different written, spoken as well as multimedia genres, as only appropriate genre literacy enables efficient use of different genres (Nidorfer Siskovic, 2013). To address these challenges, we have developed a complex e-learning environment for improving writing and communication skills of Slovene pupils – “Slovenscina na dlani”. The developed environment is divided into four general topics – orthography, grammar, phrasemes and texts. Each topic covers a number of subtopics, and for each sub-topic a number of exercises is available, along with explanations. We have used the most up-to-date language technologies and programming solutions in order to automatise the e-environment. The user’s knowledge is automatically evaluated, and based on this s/he is automatically guided through the environment in a way to improve her/his writing and communication skills. The e-environment has also a special user interface for teachers which enables easy way to assign tasks as well as to track the performance of each pupil individually or a group of pupils as a whole. The gamification and professional graphic design fulfil the user experience. The “Slovenscina na dlani” will be freely available at https://slo-na-dlani.si from September 2021 on.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Lucija Gril; Mirjam Sepesy Maučec; Gregor Donaj; Andrej Žgank;Lucija Gril; Mirjam Sepesy Maučec; Gregor Donaj; Andrej Žgank;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . Preprint . 2021Open Access EnglishAuthors:Lucia Vlášková; Hana Strachoňová;Lucia Vlášková; Hana Strachoňová;Publisher: Unpublished
As a growing field of study within sign language linguistics, sign language lexicography faces many challenges that have already been answered for audio-oral language material. In this paper, we present some of these challenges and methods developed to help navigate the complex lexical classification field. The described methods and strategies are implemented in the first Czech sign language (ČZJ) online dictionary, a part of the platform Dictio, developed at Masaryk University in Brno. We cover the topic of lemmatisation and how to decide what constitutes a lexeme in sign language. We introduce four types of expressions that qualify for a dictionary entry: a simple lexeme, a compound, a derivative, and a set phrase. We address the question of the place of classifier constructions and shape and size specifiers in a dictionary, given their peculiar semantic status. We maintain the standard classification of classifiers (whole entity and holding classifiers) and size and shape specifiers (SASSes; static and tracing specifiers). We provide arguments for separating the category of specifiers from the category of classifiers. We discuss the proper treatment of mouthings and mouth gestures concerning citation forms, derivation and translation. We show why it is difficult in sign language to distinguish synonyms from variants and how our proposed phonological criteria can help. We explain how to construct a semantic definition in a sign language and what is the solution for multiple meanings of one form. We offer simple guidelines for forming proper examples of use in a sign language. And finally, we briefly comment on the process of the translation between sign and spoken languages. We conclude the paper with a summary of roles that Dictio plays in the ČZJ-signing community.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Katja Meden; Ana Cvek;Katja Meden; Ana Cvek;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
The fields of humanities and social sciences are often deprived of inclusion within the international citation indexes such as Scopus and Web of Science (WOS). The reason for this offshift in the indexes are commonly associated with the format of published works, e.g. the most common type of published works in humanities are monographs (though the scientific journals are on the rise), which are not typically included in WOS and Scopus. Even though Scopus is far more inclusive of such types and fields in comparison to WOS, there is still a gap to be filled. As a response to this predicament the Institute of Contemporary History developed its own citation index – the Historiography Citation Index (HCI), which was first meant to only track the research production within the institution, but has since been expanded to cover the production of the whole field of Slovene historiography. Over the years HCI was a subject of several upgrades and data harmonization attempts. Even with the upgrades, several shortcomings of the systems were apparent, and therefore, another upgrade was taken into consideration, and after the extensive analysis was performed, we identified the most problematic aspects of the index and began working on another upgrade. The upgrade was performed in two parts – in the first one, we took upon ourselves to improve the administrative system in which we implemented the ElasticSearch technology to improve our search engine and filtration system, as well as improving the data masks to increase the precision and accuracy of the data input into the index. As a part of the administrative system upgrade we also modeled the MODS application profile to increase the interoperability of our data and therefore, enabling the exchange of our data between different information systems without losing data and its context. In the second part, we upgraded the user interface of the citation index to be more user friendly. In order to increase the coherence of the data display, we implemented a table-like design of the search result, equipped with filters in each column. To increase the visibility of the most important factor of the citation index, number of citations the work has received, we included additional column just for that information. The index aims to enable researchers access to the information on the number of citations, cited works ect. It is also recognised by the Slovenian Research Agency (ARRS) as a valid source of citations and could be used to provide proof of the researchers achievements and scientific excellency, though it is still not recognised as equal to the SICRIS information system. With the upgrade we increased the efficiency of the citation index, as well as its usability, and with it ensured a more intuitive system to its indexators and users.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Matej Ulčar; Anka Supej; Marko Robnik-Šikonja; Senja Pollak;Matej Ulčar; Anka Supej; Marko Robnik-Šikonja; Senja Pollak;Publisher: ZenodoProject: EC | EMBEDDIA (825153)
In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open Access EnglishAuthors:Toma Tasovac; Ana Salgado; Rute Costa;Toma Tasovac; Ana Salgado; Rute Costa;Publisher: Zenodo
The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2020Open Access EnglishAuthors:Ina Ferbežar; Igor Cetina; Alojz Ihan; Marko Stabej; Lana Zdravković; Tina Zupančič;Ina Ferbežar; Igor Cetina; Alojz Ihan; Marko Stabej; Lana Zdravković; Tina Zupančič;Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
The 54th ALTE (Association of Language Testers in Europe) meeting and public consultation took place in Ljubljana between 6th and 8th November 2019. The meeting titled Monolingual testing in multilingual reality: Language ideologies and their impact on language testing was organized by the University of Ljubljana, the Faculty of Arts, and the Center for Slovene as a Second and Foreign Language at the Department of Slovene Studies. In this context, a round table (Close) encounters of language policy makers was held on 8th November 2019. We are publishing the transcription of the discussion between the participants of the event.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.