Quick search
Advanced search in
Field to searchTerm
Add rule
Download Results
54 research outcomes, page 5 of 6
  • research data . 2015 . Embargo End Date: 15 May 2015
    Open Access
    Authors:
    Agirre, Eneko; Branco, António; Popel, Martin; Simov, Kiril;
    Persistent Identifiers
    Publisher: University of the Basque Country, UPV/EHU
    Project: EC | QTLEAP (610516)

    This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgarian...

    Add to ORCIDorcid
  • research data . 2014 . Embargo End Date: 28 Apr 2014
    Open Access
    Authors:
    Dušek, Ondřej; Hajič, Jan; Hlaváčová, Jaroslava; Pecina, Pavel; Tamchyna, Aleš; Urešová, Zdeňka;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | KHRESMOI (257528)

    This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.

    Add to ORCIDorcid
  • research data . 2014 . Embargo End Date: 27 Mar 2014
    Open Access
    Authors:
    Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | MOSESCORE (288487)

    We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger...

    Add to ORCIDorcid
  • research data . 2013 . Embargo End Date: 02 Apr 2014
    Open Access
    Authors:
    Pecina, Pavel; Dušek, Ondřej; Hajič, Jan; Urešová, Zdeňka;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | KHRESMOI (257528)

    This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts.

    Add to ORCIDorcid
  • research data . 2013 . Embargo End Date: 10 Dec 2013
    Open Access
    Authors:
    Bojar, Ondřej; Macháček, Matouš; Tamchyna, Aleš; Zeman, Daniel;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | MOSESCORE (288487)

    This dataset contains the whole set of very many Czech translations for 50 English source sentences coming from WMT11 test set (http://www.statmt.org/wmt11). In total, there are 15431447 Czech sentences, i.e. 300k reference translations per source English sentence on av...

    Add to ORCIDorcid
  • research data . 2012 . Embargo End Date: 13 Nov 2012
    Open Access
    Authors:
    Bojar, Ondřej; Zeman, Daniel; Dušek, Ondřej; Břečková, Jana; Farkačová, Hana; Grošpic, Pavel; Kačenová, Kristýna; Knechtová, Eva; Koubová, Anna; Lukavská, Jana; ...
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | EUROMATRIXPLUS (231720)

    Additional three Czech reference translations of the whole WMT 2011 data set (http://www.statmt.org/wmt11/test.tgz), translated from the German originals. Original segmentation of the WMT 2011 data is preserved.

    Add to ORCIDorcid
  • research data . 2012
    Open Access
    Authors:
    Hajič, Jan; Hajičová, Eva; Panevová, Jarmila; Sgall, Petr; Cinková, Silvie; Fučíková, Eva; Mikulová, Marie; Pajas, Petr; Popelka, Jan; Semecký, Jiří; ...
    Publisher: Linguistic Data Consortium
    Project: EC | EUROMATRIXPLUS (231720)

    Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed Czech-English parallel corpus sized over 1.2 million running words in almost 50,000 sentences f...

    Add to ORCIDorcid
  • research data . 2012 . Embargo End Date: 15 May 2012
    Open Access
    Authors:
    Galuščáková, Petra; Garabík, Radovan; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | EUROMATRIXPLUS (231720)

    English-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is pu...

    Add to ORCIDorcid
  • research data . 2012 . Embargo End Date: 15 May 2012
    Open Access
    Authors:
    Galuščáková, Petra; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | EUROMATRIXPLUS (231720)

    Manual classification of errors of English-Slovak translation according to the classification introduced by Vilar et al. [1]. 50 sentences randomly selected from WMT 2011 test set [2] were translated by 3 MT systems described in [3] and MT errors were manually marked an...

    Add to ORCIDorcid
  • research data . 2012 . Embargo End Date: 15 May 2012
    Open Access
    Authors:
    Galuščáková, Petra; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | EUROMATRIXPLUS (231720)

    Manual classification of errors of Czech-Slovak translation according to the classification introduced by Vilar et al. [1]. First 50 sentences from WMT 2010 test set were translated by 5 MT systems (Česílko, Česílko2, Google Translate and two Moses setups) and MT errors...

    Add to ORCIDorcid
54 research outcomes, page 5 of 6