search
Include:
410 Research products, page 1 of 41

  • CLARIN

10
arrow_drop_down
Relevance
arrow_drop_down
  • Publication . Conference object . 2017
    Open Access
    Authors: 
    Liliana Melgar; Marijn Koolen; Hugo C. Huurdeman; Jaap Blom;
    Publisher: ACM
    Country: Netherlands

    Annotation has been identified as one of the "scholarly primitives", and plays a pivotal role in facilitating access to audio-visual (AV) media in a scholarly context. However, there is a lack of understanding of scholars' annotation needs and behavior. This paper is pa...

  • Publication . Conference object . 2015
    English
    Authors: 
    Chanier, Thierry; Poudat, Céline; Wigham, Ciara;
    Publisher: HAL CCSD
    Country: France

    International audience; CoMeRe (acronym which in French stands for network mediated communication) is a national project involving researchers from 8 different research units to develop a repos-itory of CMC all modeled within the same extension of the TEI (Chanier et al...

  • English
    Authors: 
    Beißwenger, Michael; Chanier, Thierry; Ehrhardt, Eric; Herold, Axel; Lüngen, Harald; Poudat, Céline; Storrer, Angelika;
    Publisher: HAL CCSD
    Country: France

    International audience; The panel presents results and ongoing work from corpus projects in which TEI-P5 hasbeen adopted for the representation and linguistic annotation of genres of social mediaand computer-mediated communication (CMC). It relates to the work of the TE...

  • Open Access English
    Authors: 
    Krzysztof Wołk; Krzysztof Marasek; Agnieszka Wołk;
    Publisher: Polish Information Processing Society

    In contemporary world, translation becomes a critical need of the time. Parallel dictionaries have now become a most accessible source by humans, but confines are there as they do not offer good quality translation function, because of neologisms and words that are out ...

  • Open Access
    Authors: 
    Kocmi, Tom; Bojar, Ond��ej;
    Country: Czech Republic
    Project: EC | QT21 (645452)

    In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text. Monolingual language identification assumes that the given document is written in one language. In multilingual language i...

  • Open Access
    Authors: 
    Reynaert, M.; Gompel, M. van; Sloot, K. van der; Bosch, A.P.J. van den;
    Country: Netherlands

    CLARIN activities in the Netherlands in 2015 are in transition between the first national project CLARIN-NL and its successor CLARIAH. In this paper we give an overview of important infrastructure developments which have taken place throughout the first and which are ta...

  • Publication . Article . Conference object . Preprint . 2018 . Embargo End Date: 01 Jan 2018
    Open Access
    Authors: 
    Ondřej Cífka; Ondřej Bojar;
    Publisher: arXiv

    One of possible ways of obtaining continuous-space sentence representations is by training neural machine translation (NMT) systems. The recent attention mechanism however removes the single point in the neural network from which the source sentence representation can b...

  • English
    Authors: 
    Tahko, Tuuli; Zehavi, Ora; Lhotak, Martin; Romanova, Natasha; Clivaz, Claire; Ros, Salvador; Raciti, Marco;
    Publisher: HAL CCSD
    Country: France
    Project: EC | Locus Ludi (741520), EC | DESIR (731081)

    The DESIR project sets out to strengthen the sustainability of DARIAH and firmly establish it as a long-term leader and partner within arts and humanities communities. The project was designed to address six core infrastructural sustainability dimensions and one of thes...

  • Research data . 2011 . Embargo End Date: 23 Nov 2011
    Open Access
    Authors: 
    Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel;
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | EUROMATRIXPLUS (231720)

    A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens

  • Research data . 2018 . Embargo End Date: 19 Feb 2018
    Restricted
    Authors: 
    Specia, Lucia; Logacheva, Varvara; Blain, Frederic; Fernandez, Ramon; Martins, André;
    Publisher: University of Sheffield
    Project: EC | QT21 (645452)

    Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, witho...