Advanced search in Research outcomes
Filters
Clear AllFilters
Clear AllLoading
- research data . 2021 . Embargo End Date: 11 Mar 2021Open AccessAuthors:Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeman, Daniel;Persistent Identifiers
handle: 11234/1-3510
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | Bergamot (825303)CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.1 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morpho...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2020 . Embargo End Date: 02 Jul 2020Open AccessAuthors:Çano, Erion;Persistent Identifiers
handle: 11234/1-3257
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ELITR (825460)OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. ...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2020 . Embargo End Date: 19 Jun 2020Open AccessAuthors:Barančíková, Petra; Bojar, Ondřej;Persistent Identifiers
handle: 11234/1-3248
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | Bergamot (825303)Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2020 . Embargo End Date: 16 Jul 2020Open AccessAuthors:Parida, Shantipriya; Bojar, Ondřej;Persistent Identifiers
handle: 11234/1-3211
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ROXANNE (833635)Data ----- We have collected English-Odia parallel data for the purposes of NLP research of the Odia language. The data for the parallel corpus was extracted from existing parallel corpora such as OdiEnCorp 1.0 and PMIndia, and books which contain both English and Odia ...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2020 . Embargo End Date: 14 Aug 2020Open AccessAuthors:Parida, Shantipriya; Bojar, Ondřej;Persistent Identifiers
handle: 11234/1-3267
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ROXANNE (833635)Data ---- Hindi Visual Genome 1.1 is an updated version of Hindi Visual Genome 1.0. The update concerns primarily the text part of Hindi Visual Genome, fixing translation issues reported during WAT 2019 multimodal task. In the image part, only one segment and thus one i...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2019 . Embargo End Date: 05 Dec 2019Open AccessAuthors:Barančíková, Petra; Bojar, Ondřej;Persistent Identifiers
handle: 11234/1-3123
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | Bergamot (825303)COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The dataset consist of 4,262 unique sentences with average length of 10 words,...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2019 . Embargo End Date: 31 Oct 2019Open AccessAuthors:Çano, Erion;Persistent Identifiers
handle: 11234/1-3079
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ELITR (825460)OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2019 . Embargo End Date: 21 Oct 2019Open AccessAuthors:Çano, Erion;Persistent Identifiers
handle: 11234/1-3062
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ELITR (825460)OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keyword strings from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release ver...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2019 . Embargo End Date: 12 Sep 2019Open AccessAuthors:Çano, Erion;Persistent Identifiers
handle: 11234/1-3043
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ELITR (825460)OAGS is a title generation dataset consisting of 34993700 abstracts and titles from scientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2019 . Embargo End Date: 15 Jul 2019Open AccessAuthors:Macháček, Dominik; Kratochvíl, Jonáš; Vojtěchová, Tereza; Bojar, Ondřej;Persistent Identifiers
handle: 11234/1-3023
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)Project: EC | ELITR (825460)We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the pr...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome.