Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
14 Research products, page 1 of 2

  • Publications
  • 2018-2022
  • FR
  • BE
  • IE
  • CLARIN
  • Digital Humanities and Cultural Heritage

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Publication . Preprint . Article . 2020 . Embargo End Date: 01 Jan 2020
    Open Access
    Authors: 
    Kocmi, Tom; Limisiewicz, Tomasz; Stanovsky, Gabriel;
    Publisher: arXiv
    Project: EC | Bergamot (825303)

    Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information. Comment: Accepted WMT20

  • Publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . Article . 2020
    Open Access English
    Authors: 
    Rehm, Georg; Marheinecke, Katrin; Hegele, Stefanie; Piperidis, Stelios; Bontcheva, Kalina; Hajic, Jan; Choukri, Khalid; Vasiljevs, Andrejs; Backfried, Gerhard; Prinz, Christoph; +37 more
    Countries: France, Denmark, France
    Project: SFI | ADAPT: Centre for Digital... (13/RC/2106), EC | BDVe (732630), EC | ELG (825627), EC | AI4EU (825619), FCT | PINFRA/22117/2016 (PINFRA/22117/2016), EC | X5gon (761758), SFI | ADAPT: Centre for Digital... (13/RC/2106), EC | BDVe (732630), EC | ELG (825627), EC | AI4EU (825619),...

    Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  • Publication . Conference object . Preprint . Article . 2020
    Open Access English
    Authors: 
    Khojasteh, H. A.; Ansari, E.; Mahdi Bohlouli;
    Publisher: HAL CCSD
    Country: France

    Language recognition has been significantly advanced in recent years by means of modern machine learning methods such as deep learning and benchmarks with rich annotations. However, research is still limited in low-resource formal languages. This consists of a significant gap in describing the colloquial language especially for low-resourced ones such as Persian. In order to target this gap for low resource languages, we propose a "Large Scale Colloquial Persian Dataset" (LSCP). LSCP is hierarchically organized in a semantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. This encompasses the recognition of multiple semantic aspects in the human-level sentences, which naturally captures from the real-world sentences. We believe that further investigations and processing, as well as the application of novel algorithms and methods, can strengthen enriching computerized understanding and processing of low resource languages. The proposed corpus consists of 120M sentences resulted from 27M tweets annotated with parsing tree, part-of-speech tags, sentiment polarity and translation in five different languages. Comment: 6 pages, 2 figures, 3 tables, Accepted at the 12th International Conference on Language Resources and Evaluation (LREC 2020)

  • English
    Authors: 
    Wissik, Tanja; Edmond, Jennifer; Fischer, Frank; de Jong, Franciska; Scagliola, Stefania; Scharnhorst, Andrea; Schmeer, Hendrik; Scholger, Walter; Wessels, Leon;
    Publisher: HAL CCSD
    Country: France
    Project: EC | PARTHENOS (654119), EC | CLARIN-PLUS (676529)

    The digital humanities (DH) enrich the traditional fields of the humanities with new practices, approaches and methods. Since the turn of the millennium, the necessary skills to realise these new possibilities have been taught in summer schools, workshops and other alternative formats. In the meantime, a growing number of Bachelor's and Master's programmes in digital humanities have been launched worldwide. The DH Course Registry, which is the focus of this article, was created to provide an overview of the growing range of courses on offer worldwide. Its mission is to gather the rich offerings of different courses and to provide an up-to-date picture of the teaching and training opportunities in the field of DH. The article provides a general introduction to this emerging area of research and introduces the two European infrastructures CLARIN and DARIAH, which jointly operate the DH Course Registry. A short history of the Registry is accompanied by a description of the data model and the data curation workflow. Current data, available through the API of the Registry, is evaluated to quantitatively map the international landscape of DH teaching.Preprint of a publication for LibraryTribune (China) (accepted)

  • English
    Authors: 
    Tahko, Tuuli; Zehavi, Ora; Lhotak, Martin; Romanova, Natasha; Clivaz, Claire; Ros, Salvador; Raciti, Marco;
    Publisher: HAL CCSD
    Country: France
    Project: EC | Locus Ludi (741520), EC | DESIR (731081)

    The DESIR project sets out to strengthen the sustainability of DARIAH and firmly establish it as a long-term leader and partner within arts and humanities communities. The project was designed to address six core infrastructural sustainability dimensions and one of these was dedicated to training and education, which is also one of the four pillars identified in the DARIAH Strategic Plan 2019-2026. In the framework of Work Package 7: Teaching, DESIR organised dedicated workshops in the six DARIAH accession countries (Czech Republic, Finland, Israel, Spain, Switzerland and the United Kingdom) to introduce them to the DARIAH infrastructure and related services, and to develop methodological research skills. The topic of each workshop was decided by accession countries representatives according to the training needs of the national communities of researchers in the (Digital) Humanities. Training topics varied greatly: on the one hand, some workshops had the objective to introduce participants to specific methodological research skills; on the other hand, a different approach was used, and some events focused on the infrastructural role of training and education. The workshops organised in the context of Work Package 7: Teaching are listed below:• CZECH REPUBLIC: “A series of fall tutorials 2019 organized by LINDAT/CLARIAHCZ, tutorial #3 on TEI Training”, November 28, 2019, Prague;• FINLAND: “Reuse & sustainability: Open Science and social sciences and humanities research infrastructures”, 23 October 2019, Helsinki;• ISRAEL: “Introduction to Text Encoding and Digital Editions”, 24 October 2019, Haifa;• SPAIN: “DESIR Workshop: Digital Tools, Shared Data, and Research Dissemination”, 3 July 2019, Madrid;• SWITZERLAND: “Sharing the Experience: Workflows for the Digital Humanities”, 5-6 December 2019, Neuchâtel;• UNITED KINGDOM: “Research Software Engineering for Digital Humanities: Role of Training in Sustaining Expertise”, 9 December, London.

  • English
    Authors: 
    Beretta, Francesco; Alamercery, Vincent; Derks, Sebastiaan; Petram, Lodewijk; Schneider, Jonas;
    Publisher: HAL CCSD
    Country: France

    International audience; The poster focus on the basic workflow of model alignment and data integration that can be built using the Ontology management environment OntoME application in connection with the virtual research environment Geovistory (http://geovistory.com/). The integration of data from different sources is based on an explicit documentation of each original model, then on the alignment of these models with classes and properties designed within the conceptual framework of CIDOC CRM (ISO 21127:2014). For this purpose, new classes and properties close to the research domain can be added in OntoME within namespaces dedicated to the different projects. They will then be aligned with the more abstract classes of CIDOC CRM in order to achieve data interoperability.The poster present the essential steps and first results in the process of integrating data from different sources carried out in the Huygens ING/CLARIAH Geovistory pilot project (http://forum.dataforhistory.org/node/150). The aim of this project is to import sets of data into the Geovistory online application, currently developed by Kleio Lab GmbH, in order to allow historians to reuse the data, analyze it and visualize it. At the end of the process, the wrangled data can be made available to other researchers and the public on a human readable webpage or a SPARQL endpoint.

  • Open Access English
    Authors: 
    van Bavel, B.J.P.; Curtis, Daniel; Hannaford, Matthew; Moatsos, M.; Roosen, Joris; Soens, Tim; LS Transities v. economie en samenleving; OGKG - Sociaal-economische geschiedenis; LS Economische Geschiedenis;
    Countries: Netherlands, Belgium
    Project: EC | COORDINATINGFORLIFE (339647), NWO | CLARIAH Common Lab Resear... (2300184354)

    Recent advances in paleoclimatology and the growing digital availability of large historical datasets on human activity have created new opportunities to investigate long‐term interactions between climate and society. However, noncritical use of historical datasets can create pitfalls, resulting in misleading findings that may become entrenched as accepted knowledge. We demonstrate pitfalls in the content, use and interpretation of historical datasets in research into climate and society interaction through a systematic review of recent studies on the link between climate and (a) conflict incidence, (b) plague outbreaks and (c) agricultural productivity changes. We propose three sets of interventions to overcome these pitfalls, which involve a more critical and multidisciplinary collection and construction of historical datasets, increased specificity and transparency about uncertainty or biases, and replacing inductive with deductive approaches to causality. This will improve the validity and robustness of interpretations on the long‐term relationship between climate and society. This article is categorized under: Climate, History, Society, Culture > Disciplinary Perspectives Recent literature investigating long‐term interactions between climate and society increasingly utilizes historical big data. Too often this is done without applying historical criticism, which may lead to misguided narratives. We propose a set of interventions to avoid this and optimize the use of historical datasets.

  • English
    Authors: 
    Darhri, Anas Alaoui M.; Vincent Baillet; Bastien Bourineau; Alessio Calantropio; Gabriella Carpentiero; Medhi Chayani; Livio de Luca; Iwona Dudek; Bruno Dutailly; Hélène Gautier; +22 more
    Publisher: HAL CCSD
    Country: France
    Project: EC | PARTHENOS (654119)

    International audience; Through this White Paper, which gathers contributions from experts of 3D data as well as professionals concerned with the interoperability and sustainability of 3D research data, the PARTHENOS project aims at highlighting some of the current issues they have to face, with possible specific points according to the discipline, and potential practices and methodologies to deal with these issues.During the workshop, several tools to deal with these issues have been introduced and confronted with the participants experiences, this White Paper now intends to go further by also integrating participants feedbacks and suggestions of potential improvements.Therefore, even if the focus is put on specific tools, the main goal is to contribute to the development of standardized good practices related to the sharing, publication, storage and long-term preservation of 3D data.

  • English
    Authors: 
    Raciti, Marco; Moranville, Yoann; Thiel, Carsten;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)
  • Open Access English
    Authors: 
    Van Der Eycken, Johan; Styven, Dorien; Gheldof, Tom; Depoortere, Rolande;
    Publisher: HAL CCSD
    Countries: France, Belgium

    This article shows that metadata plays a central role in our society and concludes that through collaborative work, it is possible to pool solutions and to establish relationships of cooperation, both at the level of practical tool development and with regard to sharing and creating knowledge and know-how. ispartof: ABB: Archives et Bibliothèques de Belgique - Archief- en Bibliotheekwezen in België vol:106 pages:135-144 status: published

Send a message
How can we help?
We usually respond in a few hours.