Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
54 Research products, page 1 of 6

  • 2017-2021
  • Part of book or chapter of book
  • Archive ouverte UNIGE
  • Digital Humanities and Cultural Heritage

10
arrow_drop_down
Relevance
arrow_drop_down
  • Publication . Part of book or chapter of book . 2021
    Open Access English
    Authors: 
    Arena, Francesca;
    Publisher: Le Mans Université
    Country: Switzerland

    Almost entirely overlooked throughout the 20th century, neglected by contemporary medical manuals, the clitoris has gradually returned centre stage thanks to Western feminism.

  • Publication . Part of book or chapter of book . 2017
    Open Access
    Authors: 
    Eric Haeberli;
    Publisher: De Gruyter
    Country: Switzerland
    Project: SNSF | Revisiting the Loss of Ve... (124619), SNSF | The History of English Au... (143302)

    This paper offers an overview of the history of medial NP-adjuncts from Old English to Present-Day English. In Present-Day English, adverbs are perfectly grammatical in a position between the subject and the main verb ('He recently left for London') whereas NP-adjuncts are at best stylistically marked in this position ('(*)He tomorrow leaves for London'). The paper shows that while medial placement of NP-adjuncts has been considerably less frequent as compared to adverbs ever since around 1500, the contrast was initially much stronger in clauses with finite main verbs than in clauses with finite auxiliaries. It is only in the 19th century that medial placement becomes equally marked in both contexts. These developments are accounted for in terms of processing constraints disfavouring the use of medial NP-adjuncts and a structural reanalysis of NP-medial adjuncts in Late Modern English.

  • Publication . Part of book or chapter of book . 2018
    Open Access
    Authors: 
    Manny Rayner; Johanna Gerlach; Pierrette Bouillon; Nikos Tsourakis; Hervé Spechbach;
    Publisher: Springer International Publishing
    Country: Switzerland

    We consider methods for handling incomplete (elliptical) utterances in spoken phraselators, and describe how they have been implemented inside BabelDr, a substantial spoken medical phraselator. The challenge is to extend the phrase matching process so that it is sensitive to preceding dialogue context. We contrast two methods, one using limited-vocabulary strict grammar-based speech and language processing and one using large-vocabulary speech recognition with fuzzy grammar-based processing, and present an initial evaluation on a spoken corpus of 821 context-sentence/elliptical-phrase pairs. The large-vocabulary/fuzzy method strongly outperforms the limited-vocabulary/strict method over the whole corpus, though it is slightly inferior for the subset that is within grammar coverage. We investigate possibilities for combining the two processing paths, using several machine learning frameworks, and demonstrate that hybrid methods strongly outperform the large-vocabulary/fuzzy method.

  • Open Access French
    Authors: 
    Mayor, Anne; Douze, Katja; Lorenzo Martinez, Maria; Truffa Giachet, Miriam; Aymeric Nsangou, Jacques De Limbepe; Bocoum, Hamady; Champion, Louis; Cervera, Céline; Davidoux, Sarah; Garnier, Aline; +13 more
    Publisher: Fondation Suisse-Liechtenstein pour les recherches archéologiques à l'étranger (Zürich)
    Country: Switzerland

    Cet article présente les résultats de la campagne de terrain menée au Sénégal oriental en 2017 dans le cadre du programme international « Peuplement humain et paléoenvironnement en Afrique ». Il intègre les résultats de deux projets complémentaires : le projet ANR-FNS CheRCHA, ainsi que le projet FNS Falémé. Le premier vise à reconstituer le cadre chronostratigraphique et les évolutions culturelles au Pléistocène et à l'Holocène ancien et moyen dans la vallée de la Falémé, tandis que le second est ciblé sur les dynamiques techniques des deux derniers millénaires au Sénégal oriental.

  • Publication . Article . Other literature type . Part of book or chapter of book . Conference object . Preprint . 2018
    Open Access English
    Authors: 
    Kristina Gulordava; Piotr Bojanowski; Edouard Grave; Tal Linzen; Marco Baroni;
    Country: Switzerland

    Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence. Accepted to NAACL 2018

  • Publication . Other literature type . Conference object . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Marcos Zampieri; Shervin Malmasi; Nikola Ljubešić; Preslav Nakov; Ahmed Ali; Jörg Tiedemann; Yves Scherrer; Noëmi Aepli;
    Country: Switzerland

    We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL’2017. This year, we included four shared tasks: Discriminating between Similar Languages (DSL), Arabic Dialect Identification (ADI), German Dialect Identification (GDI), and Cross-lingual Dependency Parsing (CLP). A total of 19 teams submitted runs across the four tasks, and 15 of them wrote system description papers.

  • Publication . Conference object . Part of book or chapter of book . 2020
    Open Access English
    Authors: 
    Elisa Terumi Rubel Schneider; João Vitor Andrioli de Souza; Julien Knafou; Lucas Emanuel Silva e Oliveira; Jenny Copara; Yohan Bonescki Gumiel; Lucas Ferro Antunes de Oliveira; Emerson Cabrera Paraiso; Douglas Teodoro; Claudia Maria Cabral Moro Barra;
    Publisher: Association for Computational Linguistics
    Country: Switzerland

    With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72%, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.

  • Open Access English
    Authors: 
    Manny Rayner; Nikos Tsourakis; Johanna Gerlach;
    Publisher: Springer
    Country: Switzerland

    We describe a simple spoken utterance classification method suitable for data-sparse domains which can be approximately described by CFG grammars. The central idea is to perform robust matching of CFG rules against output from a large-vocabulary recogniser, using a dynamic programming method which optimises the tf-idf score of the matched grammar string. We present results of experiments carried out on a substantial CFG-based medical speech translator and the publicly available Spoken CALL Shared Task. Robust utterance classification using the tf-idf method strongly outperforms plain CFG-based recognition for both domains. When comparing with Naive Bayes classifiers trained on data sampled from the CFG grammars, the tf-idf/dynamic programming method is much better on the complex speech translation domain, but worse on the simple Spoken CALL Shared Task domain.

  • Publication . Conference object . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Claudia Baur; Cathy Chua; Johanna Gerlach; Manny Rayner; Martin J. Russell; Helmer Strik; Xizi Wei;
    Country: Switzerland

    We present an overview of the shared task for spoken CALL. Groups competed on a prompt-response task using English-language data collected, through an online CALL game, from Swiss German teens in their second and third years of learning English. Each item consists of a written German prompt and an audio file containing a spoken response. The task is to accept linguistically correct responses and reject linguistically incorrect ones, with “linguistically correct” being defined by a gold standard derived from human annotations; scoring was performed using a metric defined as the ratio of the relative rejection rates on incorrect and correct responses. The task received twenty entries from nine different groups. We present the task itself, the results, a tentative analysis of what makes items challenging, a comparison between different metrics, and suggestions for a continuation.

  • Publication . Part of book or chapter of book . Conference object . 2017
    Open Access
    Authors: 
    Achim Rabus; Yves Scherrer;
    Countries: Switzerland, Finland

    This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4% relative (42.9% absolute), which results in a tagging recall increased by 11.6% relative (9.1% absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages. Peer reviewed

Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
54 Research products, page 1 of 6
  • Publication . Part of book or chapter of book . 2021
    Open Access English
    Authors: 
    Arena, Francesca;
    Publisher: Le Mans Université
    Country: Switzerland

    Almost entirely overlooked throughout the 20th century, neglected by contemporary medical manuals, the clitoris has gradually returned centre stage thanks to Western feminism.

  • Publication . Part of book or chapter of book . 2017
    Open Access
    Authors: 
    Eric Haeberli;
    Publisher: De Gruyter
    Country: Switzerland
    Project: SNSF | Revisiting the Loss of Ve... (124619), SNSF | The History of English Au... (143302)

    This paper offers an overview of the history of medial NP-adjuncts from Old English to Present-Day English. In Present-Day English, adverbs are perfectly grammatical in a position between the subject and the main verb ('He recently left for London') whereas NP-adjuncts are at best stylistically marked in this position ('(*)He tomorrow leaves for London'). The paper shows that while medial placement of NP-adjuncts has been considerably less frequent as compared to adverbs ever since around 1500, the contrast was initially much stronger in clauses with finite main verbs than in clauses with finite auxiliaries. It is only in the 19th century that medial placement becomes equally marked in both contexts. These developments are accounted for in terms of processing constraints disfavouring the use of medial NP-adjuncts and a structural reanalysis of NP-medial adjuncts in Late Modern English.

  • Publication . Part of book or chapter of book . 2018
    Open Access
    Authors: 
    Manny Rayner; Johanna Gerlach; Pierrette Bouillon; Nikos Tsourakis; Hervé Spechbach;
    Publisher: Springer International Publishing
    Country: Switzerland

    We consider methods for handling incomplete (elliptical) utterances in spoken phraselators, and describe how they have been implemented inside BabelDr, a substantial spoken medical phraselator. The challenge is to extend the phrase matching process so that it is sensitive to preceding dialogue context. We contrast two methods, one using limited-vocabulary strict grammar-based speech and language processing and one using large-vocabulary speech recognition with fuzzy grammar-based processing, and present an initial evaluation on a spoken corpus of 821 context-sentence/elliptical-phrase pairs. The large-vocabulary/fuzzy method strongly outperforms the limited-vocabulary/strict method over the whole corpus, though it is slightly inferior for the subset that is within grammar coverage. We investigate possibilities for combining the two processing paths, using several machine learning frameworks, and demonstrate that hybrid methods strongly outperform the large-vocabulary/fuzzy method.

  • Open Access French
    Authors: 
    Mayor, Anne; Douze, Katja; Lorenzo Martinez, Maria; Truffa Giachet, Miriam; Aymeric Nsangou, Jacques De Limbepe; Bocoum, Hamady; Champion, Louis; Cervera, Céline; Davidoux, Sarah; Garnier, Aline; +13 more
    Publisher: Fondation Suisse-Liechtenstein pour les recherches archéologiques à l'étranger (Zürich)
    Country: Switzerland

    Cet article présente les résultats de la campagne de terrain menée au Sénégal oriental en 2017 dans le cadre du programme international « Peuplement humain et paléoenvironnement en Afrique ». Il intègre les résultats de deux projets complémentaires : le projet ANR-FNS CheRCHA, ainsi que le projet FNS Falémé. Le premier vise à reconstituer le cadre chronostratigraphique et les évolutions culturelles au Pléistocène et à l'Holocène ancien et moyen dans la vallée de la Falémé, tandis que le second est ciblé sur les dynamiques techniques des deux derniers millénaires au Sénégal oriental.

  • Publication . Article . Other literature type . Part of book or chapter of book . Conference object . Preprint . 2018
    Open Access English
    Authors: 
    Kristina Gulordava; Piotr Bojanowski; Edouard Grave; Tal Linzen; Marco Baroni;
    Country: Switzerland

    Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence. Accepted to NAACL 2018

  • Publication . Other literature type . Conference object . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Marcos Zampieri; Shervin Malmasi; Nikola Ljubešić; Preslav Nakov; Ahmed Ali; Jörg Tiedemann; Yves Scherrer; Noëmi Aepli;
    Country: Switzerland

    We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL’2017. This year, we included four shared tasks: Discriminating between Similar Languages (DSL), Arabic Dialect Identification (ADI), German Dialect Identification (GDI), and Cross-lingual Dependency Parsing (CLP). A total of 19 teams submitted runs across the four tasks, and 15 of them wrote system description papers.

  • Publication . Conference object . Part of book or chapter of book . 2020
    Open Access English
    Authors: 
    Elisa Terumi Rubel Schneider; João Vitor Andrioli de Souza; Julien Knafou; Lucas Emanuel Silva e Oliveira; Jenny Copara; Yohan Bonescki Gumiel; Lucas Ferro Antunes de Oliveira; Emerson Cabrera Paraiso; Douglas Teodoro; Claudia Maria Cabral Moro Barra;
    Publisher: Association for Computational Linguistics
    Country: Switzerland

    With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72%, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.

  • Open Access English
    Authors: 
    Manny Rayner; Nikos Tsourakis; Johanna Gerlach;
    Publisher: Springer
    Country: Switzerland

    We describe a simple spoken utterance classification method suitable for data-sparse domains which can be approximately described by CFG grammars. The central idea is to perform robust matching of CFG rules against output from a large-vocabulary recogniser, using a dynamic programming method which optimises the tf-idf score of the matched grammar string. We present results of experiments carried out on a substantial CFG-based medical speech translator and the publicly available Spoken CALL Shared Task. Robust utterance classification using the tf-idf method strongly outperforms plain CFG-based recognition for both domains. When comparing with Naive Bayes classifiers trained on data sampled from the CFG grammars, the tf-idf/dynamic programming method is much better on the complex speech translation domain, but worse on the simple Spoken CALL Shared Task domain.

  • Publication . Conference object . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Claudia Baur; Cathy Chua; Johanna Gerlach; Manny Rayner; Martin J. Russell; Helmer Strik; Xizi Wei;
    Country: Switzerland

    We present an overview of the shared task for spoken CALL. Groups competed on a prompt-response task using English-language data collected, through an online CALL game, from Swiss German teens in their second and third years of learning English. Each item consists of a written German prompt and an audio file containing a spoken response. The task is to accept linguistically correct responses and reject linguistically incorrect ones, with “linguistically correct” being defined by a gold standard derived from human annotations; scoring was performed using a metric defined as the ratio of the relative rejection rates on incorrect and correct responses. The task received twenty entries from nine different groups. We present the task itself, the results, a tentative analysis of what makes items challenging, a comparison between different metrics, and suggestions for a continuation.

  • Publication . Part of book or chapter of book . Conference object . 2017
    Open Access
    Authors: 
    Achim Rabus; Yves Scherrer;
    Countries: Switzerland, Finland

    This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4% relative (42.9% absolute), which results in a tagging recall increased by 11.6% relative (9.1% absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages. Peer reviewed

Send a message
How can we help?
We usually respond in a few hours.