Powered by OpenAIRE graph
Found an issue? Give us feedback

ERIC

Entrepôts, Représentation et Ingénierie des Connaissances
4 Projects, page 1 of 1
  • Funder: French National Research Agency (ANR) Project Code: ANR-19-DATA-0001
    Funder Contribution: 98,928 EUR

    Sharing and reuse of archaeological or historical data: a RDF-based description according to semantic web repositories and standards The HisArc-RDF project brings together a multidisciplinary consortium: archaeology, history, geography, terminology, bibliography and informatics. The pooling of experiences, based on the sharing and articulation of methods and software and semantic tools developed in each discipline, will make it possible to prototype (implementation and iterative tests) a "FAIR" operating chain on structurally and semantically heterogeneous archaeological and historical data sets: - to write a data management plan (DMP) for each dataset, based on the recommendations of the European Union and the french National Open Science Plan; - to develop two softwares : the first one operating a webservice between the OntoME tools (matching ontologies tool) designed by a community of historians and Opentheso (aligning thesauri tool) designed with a community of archaeologists; the second one creating a generic supervised automatic alignment interface between Opentheso and any semantic web repository; - to document each test set by a fine-grained processing chain, based on the use of microthesauri, descriptor concepts aligned with semantic web repositories, and then on the matching of the ontology expressed by the thesauri with the reference standards and ontologies of the documentary and scientific communities; thanks to the software developed, this phase will lead to a RDF-structured description of the test datasets; thus allowing, after online publication, the reporting and direct reuse ("calculability") of the datas; - to lead a wide network of historical and archaeological stakeholders (repository supports, multidisciplinary research groups, programmed and preventive archaeologies, European and non-European sites, academic and private stakeholders) through a training programme and experimental workshops, in order to disseminate the good practices supported and expressed by the operating chain and the tools developed during the project. The foundation of the HisArc-RDF project is threefold: a convergence of views born from the confrontation of multidisciplinary practices and experiences around the life cycle of data, from its acquisition to its publication, sharing and mediation; an acculturation of archaeological and historical communities to the practical and scientific challenge of aligning their vocabularies on semantic web core repositories; and finally the need for a processing chain capable of appropriation by these communities - i. e.i.e. as close as possible to business practices and work in the field and laboratories. The outcome of the project will be the realization and open publication of a methodology and associated tools in order to implement in our disciplines an ecosystem of "FAIR" data production, publication and sharing. It will be based on a proof of concept: the targeted user experience is the sharing and effective reuse of data extracted from recording systems (raw data), regardless of the structure specific to a particular database; it is the responsibility of each operating interface/visualization to pick them up and configure them to allow their reuse. The rapid implementation of these linked open data will be at the service of the widest possible academic audience: students, museums and research teams.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-21-CE23-0026
    Funder Contribution: 566,365 EUR

    Natural Language Processing has been moved a step forward with the advent of the Transformer architecture in 2017, allowing parallel training on GPU. This has led to very large language models (BERT being the most popular). However, these models are memory hungry. Neural weights compressing techniques have been proposed: weights quantization, weights pruning and knowledge distillation. These methods are all very close to maintain the same level of accuracy of the original model with massive memory savings. Beyond accuracy, we will study in the Diké project how model compression techniques affect model biases, fairness and ethical abilities of existing compression techniques (there is no free lunch). We will also propose new compression algorithms that will prevent, by design, bias, fairness or ethical issues in representations or predictions.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-19-CE38-0007
    Funder Contribution: 380,052 EUR

    The LIFRANUM project aims at identifying and structuring the corpus of digital literatures (sites, blogs, social networks) in Francophonie . This patrimonial dimension is coupled with an epistemological inquiry into the literarity of the identified contents and the dynamics of new sociabilities. Following a tracking of the URLs concerned, we will launch crawlings in order to recover large sets of data. These will be stored in a data lake that will rely for its consistency, on a simple indexing system, resulting from a taxonomy developed from, in particular, the user experience. We will use, for content indexing, data lakes , which store documents in their original format while allowing their efficient interrogation through a metadata management system. The data lake will then be the source for data mining tools, which will allow to highlight original topics, calculate similarities between entities (e.g., documents, websites, authors). In this project we aim at making those techniques interpretable so that the final users can understand the suggested structurations. The results of these analyzes will help specify and enrich the taxonomy . The description of these original literary entities requires the development of a tool based on both digital structures and the perception of users. Far from being irreconcilable, we postulate that these two approaches are complementary, and even that the robustness of an analytical tool is based on this double scientific anchoring. This taxonomy will help elaborate a simple ontology (on the model of bibliographic ontologies) from which we can deduce a set of metadata usable to characterize web entities. Beyond the constitution of the corpus of a new and fundamental dimension of contemporary literature, our project, by exploiting data lakes and data mining, revolutionizes the methods and the means of documentary description. The coherence of the project is based inter alia on the articulation between different sciences and methods for the construction and use of an object (in this case a corpus): analysis of the practices, uses and reception of these objects, and moreover data analysis, text mining, the whole used to structure a language (taxonomy, ontology and metadata set) that allows to ensure the access of literary creations by diverse users while ensuring a rigorous characterization of the objects and their structuring. The project relies on two laboratories (literature, information-communication; computer science) and on the BnF ; it is supported by the International Institute of Francophonie. The collaboration between these partners, already in progress, has allowed us to initiate research and empirically test the risks and solutions related to this project. The objective of the project therefore concerns the literary community, but beyond that, aims to make available to all disciplinary fields a corpus of magnitude as well as an innovative methodology.The objective is, by depositing the corpus in a storage space of the HUMA-NUM infrastructure, in agreement with the MSH, to produce a tool available for scientific approaches and broad uses: linguistics, statistics, computer science, information retrieval, natural language processing, among others. We are putting in place, with the appropriate partners, pedagogical uses for audiences of researchers, teachers and documentalists as well as high school students and university students. The challenge is indeed daunting: it involves helping editorial practice (through collaborative writing and writing workshops) as well as contributing to the analysis of new literacy. We foresee varied deliverables in their form as in their support, destined to the methodological accompaniment to the use of this corpus.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-24-CE54-1690
    Funder Contribution: 604,268 EUR

    The CARTAS project aims to undertake a thorough mapping of Pablo Picasso's network. Its primary objective is to conduct a profound analysis of the structure and evolution of this network across time and space, tracing from the artist's early movements between Barcelona and Paris to his eventual passing in Mougins, France. CARTAS endeavors to enrich our comprehension of how Picasso's network members have shaped the international cultural landscape over the past century. With the current absence of a comprehensive mapping or an open database documenting this network, the anticipated outcomes of this project will be poised to address this significant gap. CARTAS is built upon two foundational assumptions: firstly, it posits that this cultural network, potentially one of the most expansive of the last century, provides fertile ground for uncovering previously overlooked international intellectual dynamics; secondly, it suggests that certain cultural interactions have remained largely unexplored by the experts, leaving the roles of certain network members as an open avenue for scientific exploration. To substantiate these assertions, CARTAS will embark on an in-depth analysis of Pablo Picasso's cultural network, scrutinizing the profiles, connections, and collaborations of its constituents.

    more_vert

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.