
The LIFRANUM project aims at identifying and structuring the corpus of digital literatures (sites, blogs, social networks) in Francophonie . This patrimonial dimension is coupled with an epistemological inquiry into the literarity of the identified contents and the dynamics of new sociabilities. Following a tracking of the URLs concerned, we will launch crawlings in order to recover large sets of data. These will be stored in a data lake that will rely for its consistency, on a simple indexing system, resulting from a taxonomy developed from, in particular, the user experience. We will use, for content indexing, data lakes , which store documents in their original format while allowing their efficient interrogation through a metadata management system. The data lake will then be the source for data mining tools, which will allow to highlight original topics, calculate similarities between entities (e.g., documents, websites, authors). In this project we aim at making those techniques interpretable so that the final users can understand the suggested structurations. The results of these analyzes will help specify and enrich the taxonomy . The description of these original literary entities requires the development of a tool based on both digital structures and the perception of users. Far from being irreconcilable, we postulate that these two approaches are complementary, and even that the robustness of an analytical tool is based on this double scientific anchoring. This taxonomy will help elaborate a simple ontology (on the model of bibliographic ontologies) from which we can deduce a set of metadata usable to characterize web entities. Beyond the constitution of the corpus of a new and fundamental dimension of contemporary literature, our project, by exploiting data lakes and data mining, revolutionizes the methods and the means of documentary description. The coherence of the project is based inter alia on the articulation between different sciences and methods for the construction and use of an object (in this case a corpus): analysis of the practices, uses and reception of these objects, and moreover data analysis, text mining, the whole used to structure a language (taxonomy, ontology and metadata set) that allows to ensure the access of literary creations by diverse users while ensuring a rigorous characterization of the objects and their structuring. The project relies on two laboratories (literature, information-communication; computer science) and on the BnF ; it is supported by the International Institute of Francophonie. The collaboration between these partners, already in progress, has allowed us to initiate research and empirically test the risks and solutions related to this project. The objective of the project therefore concerns the literary community, but beyond that, aims to make available to all disciplinary fields a corpus of magnitude as well as an innovative methodology.The objective is, by depositing the corpus in a storage space of the HUMA-NUM infrastructure, in agreement with the MSH, to produce a tool available for scientific approaches and broad uses: linguistics, statistics, computer science, information retrieval, natural language processing, among others. We are putting in place, with the appropriate partners, pedagogical uses for audiences of researchers, teachers and documentalists as well as high school students and university students. The challenge is indeed daunting: it involves helping editorial practice (through collaborative writing and writing workshops) as well as contributing to the analysis of new literacy. We foresee varied deliverables in their form as in their support, destined to the methodological accompaniment to the use of this corpus.

The LIFRANUM project aims at identifying and structuring the corpus of digital literatures (sites, blogs, social networks) in Francophonie . This patrimonial dimension is coupled with an epistemological inquiry into the literarity of the identified contents and the dynamics of new sociabilities. Following a tracking of the URLs concerned, we will launch crawlings in order to recover large sets of data. These will be stored in a data lake that will rely for its consistency, on a simple indexing system, resulting from a taxonomy developed from, in particular, the user experience. We will use, for content indexing, data lakes , which store documents in their original format while allowing their efficient interrogation through a metadata management system. The data lake will then be the source for data mining tools, which will allow to highlight original topics, calculate similarities between entities (e.g., documents, websites, authors). In this project we aim at making those techniques interpretable so that the final users can understand the suggested structurations. The results of these analyzes will help specify and enrich the taxonomy . The description of these original literary entities requires the development of a tool based on both digital structures and the perception of users. Far from being irreconcilable, we postulate that these two approaches are complementary, and even that the robustness of an analytical tool is based on this double scientific anchoring. This taxonomy will help elaborate a simple ontology (on the model of bibliographic ontologies) from which we can deduce a set of metadata usable to characterize web entities. Beyond the constitution of the corpus of a new and fundamental dimension of contemporary literature, our project, by exploiting data lakes and data mining, revolutionizes the methods and the means of documentary description. The coherence of the project is based inter alia on the articulation between different sciences and methods for the construction and use of an object (in this case a corpus): analysis of the practices, uses and reception of these objects, and moreover data analysis, text mining, the whole used to structure a language (taxonomy, ontology and metadata set) that allows to ensure the access of literary creations by diverse users while ensuring a rigorous characterization of the objects and their structuring. The project relies on two laboratories (literature, information-communication; computer science) and on the BnF ; it is supported by the International Institute of Francophonie. The collaboration between these partners, already in progress, has allowed us to initiate research and empirically test the risks and solutions related to this project. The objective of the project therefore concerns the literary community, but beyond that, aims to make available to all disciplinary fields a corpus of magnitude as well as an innovative methodology.The objective is, by depositing the corpus in a storage space of the HUMA-NUM infrastructure, in agreement with the MSH, to produce a tool available for scientific approaches and broad uses: linguistics, statistics, computer science, information retrieval, natural language processing, among others. We are putting in place, with the appropriate partners, pedagogical uses for audiences of researchers, teachers and documentalists as well as high school students and university students. The challenge is indeed daunting: it involves helping editorial practice (through collaborative writing and writing workshops) as well as contributing to the analysis of new literacy. We foresee varied deliverables in their form as in their support, destined to the methodological accompaniment to the use of this corpus.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::c2c9be67169cd6842f8b4c1686d3be47&type=result"></script>');
-->
</script>