Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Slovenščina 2.0: Emp...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Corpora and concordancers on the nl.ijs.si server

Authors: Tomaž Erjavec;

Corpora and concordancers on the nl.ijs.si server

Abstract

The paper presents the monolingual and parallel corpora which can be accessed through two concordancers on the server nl.ijs.si. Twelve monolingual corpora contain Slovene language texts, one contains Japanese and one English texts, and comprise reference corpora, such as Gigafida for written contemporary Slovene, IMP for historical Slovene, and GOS for spoken Slovene and specialised corpora, such as the corpus of texts from the informatics domain and the corpus of Slovene tweets. The five parallel corpora contain Slovene texts sentence aligned with, variously, English, Japanese, French, German, and Italian from domains such as EU law, literature and journalism. Although most of the corpora have been produced in the past, they have now been newly annotated, some have been extended with additional texts, and a few are completely new. The texts in the corpora are supplied with meta-data, while their word tokens are either manually or automatically annotated with at least lemmas and morphosyntactic descriptions. Most of the corpora are freely available through two web concordancers, the noSketch Engine and CUWI. These two corpus analysis tools support searching large annotated corpora, various types of search result display, the possibility to filter the searches according to meta-data, and saving the search results locally. In addition to the corpora and concordancers the paper also discusses some issues pertaining to such a corpus-linguistic infrastructure, and concludes with directions for further work.

Subjects by Vocabulary

Library of Congress Subject Headings: lcsh:Philology. Linguistics lcsh:P1-1091

Keywords

CWB, concordancers, language corpora, noSketchEngine, CUWI

32 references, page 1 of 4

Arhar Holdt, Š., in Gorjanc, V. (2007): Korpus FidaPLUS: nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo, 52 (2): 95-110. [OpenAIRE]

Christ, O. (1994): A Modular and Flexible Architecture for an Integrated Corpus Query System. Proceedings of the Conference in Computational Lexicography, COMPLEX '94: 23-32. Budimpešta: Hungarian Academy of Sciences.

Erjavec, T. (2002): The IJS-ELAN Slovene-English parallel corpus. International Journal of Corpus Linguistics, 7 (1): 1-20.

Erjavec, T., Ignat, C., Pouliquen, B., in Steinberger, R. (2005): Massive MultiLingual Corpus Compilation: Acquis Communautaire and ToTaLe. Proceedings of the 2nd Language & Technology Conference: 32-36. Poznan.

Erjavec, T., in Krek, S. (2008): Oblikoskladenjska priporočila in označeni korpusi JOS. Zbornik Šeste konference Jezikovne tehnologije: 49-53. Ljubljana: Institut »Jožef Stefan«.

Erjavec, T. (2009): Odprtost jezikovnih virov za slovenščino. Infrastruktura slovenščine in slovenistike (28. simpozij Obdobja): 115-121. Ljubljana: Znanstvena založba Filozofske fakultete.

Erjavec, T. (2010): Text Encoding Initiative Guidelines and their Localisation. Infoteka, 11 (1): 3a-14a.

Erjavec, T. (2011): Automatic Linguistic Annotation of Historical Language: ToTrTaLe and XIX Century Slovene. Proceedings of the 5th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011: 33-38. Portland: Association for Computational Linguistics.

Erjavec, T. (2012): Jezikovni viri starejše slovenščine IMP: zbirka besedil, korpus, slovar. Zbornik Osme konference Jezikovne tehnologije: 52-56. Ljubljana: Institut »Jožef Stefan«.

Erjavec, T. (2013): Vzporedni korpus SPOOK: označevanje, zapis in iskanje. V Erjavec, T., Fišer, D. Krek, K., in Ledinek, N. (2010): Jezikovni viri projekta JOS. Zbornik Sedme konference Jezikovne tehnologije, 42-48. Ljubljana: Institut »Jožef Stefan«.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
moresidebar

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.