Filters
Clear Alldescription Publicationkeyboard_double_arrow_right Article , Data Paper 2021 France EnglishHAL CCSD Lassner, David; Coburger, Julius; Neudecker, Clemens; Baillot, Anne;Lassner, David; Coburger, Julius; Neudecker, Clemens; Baillot, Anne;doi: 10.17175/sb005_006
International audience; We present an OCR ground truth data set for historical prints and show improvement of recognition results over baselines with training on this data. We reflect on reusability of the ground truth data set based on two experiments that look into the legal basis for reuse of digitized document images in the case of 19th century English and German books. We propose a framework for publishing ground truth data even when digitized document images cannot be easily redistributed.
https://doi.org/10.1... arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.17175/sb005_006&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
description Publicationkeyboard_double_arrow_right Article 2010 France EnglishHAL CCSD Benoit Habert; Claude Huc;Benoit Habert; Claude Huc;Pour permettre de comprendre les interactions possibles entre transmission et numerisation, un projet pilote d'archivage numerique perenne est presente par ses deux coordinateurs, L'article evoque le contexte actuel de transmission sous forme numerique des recherches passees et presentes en sciences humaines et sociales (SHS). Il souligne l'ecart entre le role croissant des donnees numeriques et leur fragilite. Il presente le modele abstrait standard d'archivage numerique perenne et la maniere dont il a ete instancie dans le projet pilote. Il termine par un retour reflexif sur les facteurs qui vont conditionner l'avenir de projets similaires: choix et comportements organisationnels, roles respectifs des donnees et des connaissances, constitution et comportement des communautes d'utilisateurs, statut de la memoire collective en SHS.
Social Science Infor... arrow_drop_down Hyper Article en Ligne; Hyper Article en Ligne - Sciences de l'Homme et de la SociétéOther literature type . Article . 2010add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1177/0539018410371570&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu7 citations 7 popularity Average influence Average impulse Average Powered by BIP!
description Publicationkeyboard_double_arrow_right Conference object 2015 France EnglishHAL CCSD Longhi, Julien; Wigham, Ciara R.;Longhi, Julien; Wigham, Ciara R.;International audience; The CoMeRe project (CoMeRe, 2014) aims to build a kernel corpus of computer-mediated communication (CMC) genres with interactions in the French language. Three key words characterize the project: variety, standards and openness. The project gathered mono- and multimodal, synchronous and asynchronous communication data from both Internet and telecommunication networks (text chat, tweets, SMSs, forums, blogs). A variety of interactions was sought: public or private interactions as well as interactions from informal, learning and professional situations. Whereas some CMC data types were collected within the CoMeRe project, others had previously been collected and structured within different project partners’ local research teams. This meant that the project had to overcome disparities in corpus compilation choices. For this reason, the CoMeRe project structured the corpora in a uniform way using the Text Encoding Initiative format (TEI, Burnard & Bauman, 2013) and decided to describe each corpus using Dublin Core and OLAC standards for metadata (DCMI, 2014; OLAC, 2008). The TEI model was extended in order to encompass the Interaction Space (IS) of CMC multimodal discourse (Chanier et al., 2014). The term ‘openness’ also characterizes the project: The corpora have been released as open data on the French national platform of linguistic resources (ORTOLANG, 2013) in order to pave the way for scientific examination by partners not involved in the project as well as replicative and culumative research. This poster presentation aims to give an overview of the corpus building process using, as a case study, a corpus of political tweets cmr-polititweets (Longhi et al., 2014). The corpus stemmed from a local research project on lexicon (Digital Humanities and datajournalism, supported by the Fondation of Cergy-Pontoise University). It was built starting from seven French politicians from six different political parties. In order to generate political tweets, a set of lists citing these politicians was generated (7087 lists), and lists that have tweeted at least six times and for which the description contained the word ‘politics’ were selected (120 lists in total). Finally, 2934 tweets were recovered. In order to be sure that we selected politicians’ tweets (and not, for example, those of journalists), only the accounts cited in more than 12 lists were considered; 205 politicians were tweeting. We took the last 200 tweets of each of the 205 accounts on 27 March 2014 (34,273 tweets). This allowed us to recover data that focused on the period between the two rounds of the 2014 municipal elections in France. The poster will focus, firstly, on how features specific to Twitter were included and structured in the interaction space TEI model. We will exemplify how features including hashtags that label tweets so that other users can see tweets on the same topic, at signs that allow a user to mention or reply to other users and retweets that allow a user to repost a message from another Twitter user and share it with his own followers, were integrated into the model. Secondly, the poster will evoke some of the ethical and rights issues that had to be considered before publishing a corpus of tweets. Finally, the workflow & multi-stage quality control process adopted during the building of the corpus will be illustrated. This was an essential aspect considering that the corpus underwent format conversions: the local research team had initially structured the corpus in XML whilst the CoMeRe project applied the IS TEI model to the corpus.The political tweets corpus is now structured and available online. Analyses have started to be carried out: some ideas have been launched in Djemili et al. (2014) but further analyses must adhere rigorously to methodologies stemming from the natural language processing (NLP) field.
HAL-ENS-LYON; HAL CY... arrow_drop_down HAL-ENS-LYON; HAL CY Cergy Paris UniversitéOther literature type . Conference object . 2015Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od______3393::748ce273d6fb5a589ffa367a9f1d6dd6&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eudescription Publicationkeyboard_double_arrow_right Part of book or chapter of book 2016 France EnglishHAL CCSD Thierry Chanier; Ciara R. Wigham;Thierry Chanier; Ciara R. Wigham;doi: 10.1075/lsse.2.10cha
International audience; This chapter gives an overview of one possible staged methodology for structuring LCI data by presenting a new scientific object, LEarning and TEaching Corpora (LETEC). Firstly, the chapter clarifies the notion of corpora, used in so many different ways in language studies, and underlines how corpora differ from raw language data. Secondly, using examples taken from actual online learning situations, the chapter illustrates the methodology that is used to collect, transform and organize data from online learning situations in order to make them sharable through open-access repositories. The ethics and rights for releasing a corpus as OpenData are discussed. Thirdly, the authors suggest how the transcription of interactions may become more systematic, and what benefits may be expected from analysis tools, before opening the CALL research perspective applied to LCI towards its applications to teacher-training in Computer-Mediated Communication (CMC), and the common interests the CALL field shares with researchers in the field of Corpus Linguistics working on CMC.
https://edutice.arch... arrow_drop_down https://edutice.archives-ouver...Part of book or chapter of bookLicense: cc-byData sources: UnpayWallHyper Article en LigneOther literature type . Part of book or chapter of book . 2016add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1075/lsse.2.10cha&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu1 citations 1 popularity Average influence Average impulse Average Powered by BIP!
description Publicationkeyboard_double_arrow_right Other literature type , Article 2014 France EnglishHAL CCSD Thierry Chanier; Celine Poudat; Benoit Sagot; Georges Antoniadis; Ciara Wigham; Linda Hriba; Julien Longhi; Djame Seddah;Final version to Special Issue of JLCL (Journal of Language Technology and Computational Linguistics (JLCL, http://jlcl.org/): BUILDING AND ANNOTATING CORPORA OF COMPUTER-MEDIATED DISCOURSE: Issues and Challenges at the Interface of Corpus and Computational Linguistics (ed. by Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel); International audience; The CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective.
HAL CY Cergy Paris U... arrow_drop_down HAL CY Cergy Paris UniversitéOther literature type . 2014Data sources: HAL CY Cergy Paris Universitéadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.21248/jlcl.29.2014.187&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 1visibility views 1 download downloads 0 Powered bydescription Publicationkeyboard_double_arrow_right Other literature type , Conference object 2018 France EnglishHAL CCSD Nathalie Fargier;Nathalie Fargier;International audience; A wide range of initiatives for developing research and data infrastructures have been funded in recent years. There is a growing concern amongst the academic community to maintain the resources invested beyond the period of the original research funding. If technical progress has been made to preserve the data themselves, few thinking and operational solutions exist for the institutions that create, disseminate, curate and preserve the data. How to ensure their existence over the medium or the long-term? This paper is a case study: it addresses the sustainability issues faced by Persée, a French platform dedicated to digitized documentary heritage that was launched in 2003. Through this example, the aim is to present, in practical terms, how an organization has to adapt and to change to sustain over time. Persée tested and combined various mechanisms (technical actions, users’ involvement, organizational evolution, marketing, funding models) with reciprocal influence, to achieve sustainability. Rather than a steady state, ensuring the long term existence of a data infrastructure is an ongoing and resource intensive process.
https://hal.archives... arrow_drop_down EpisciencesOther literature type . Conference object . 2018Hyper Article en Ligne - Sciences de l'Homme et de la SociétéConference object . 2018add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.4000/proceedings.elpub.2018.12&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
Loading
description Publicationkeyboard_double_arrow_right Article , Data Paper 2021 France EnglishHAL CCSD Lassner, David; Coburger, Julius; Neudecker, Clemens; Baillot, Anne;Lassner, David; Coburger, Julius; Neudecker, Clemens; Baillot, Anne;doi: 10.17175/sb005_006
International audience; We present an OCR ground truth data set for historical prints and show improvement of recognition results over baselines with training on this data. We reflect on reusability of the ground truth data set based on two experiments that look into the legal basis for reuse of digitized document images in the case of 19th century English and German books. We propose a framework for publishing ground truth data even when digitized document images cannot be easily redistributed.
https://doi.org/10.1... arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.17175/sb005_006&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
description Publicationkeyboard_double_arrow_right Article 2010 France EnglishHAL CCSD Benoit Habert; Claude Huc;Benoit Habert; Claude Huc;Pour permettre de comprendre les interactions possibles entre transmission et numerisation, un projet pilote d'archivage numerique perenne est presente par ses deux coordinateurs, L'article evoque le contexte actuel de transmission sous forme numerique des recherches passees et presentes en sciences humaines et sociales (SHS). Il souligne l'ecart entre le role croissant des donnees numeriques et leur fragilite. Il presente le modele abstrait standard d'archivage numerique perenne et la maniere dont il a ete instancie dans le projet pilote. Il termine par un retour reflexif sur les facteurs qui vont conditionner l'avenir de projets similaires: choix et comportements organisationnels, roles respectifs des donnees et des connaissances, constitution et comportement des communautes d'utilisateurs, statut de la memoire collective en SHS.
Social Science Infor... arrow_drop_down Hyper Article en Ligne; Hyper Article en Ligne - Sciences de l'Homme et de la SociétéOther literature type . Article . 2010add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1177/0539018410371570&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu7 citations 7 popularity Average influence Average impulse Average Powered by BIP!
description Publicationkeyboard_double_arrow_right Conference object 2015 France EnglishHAL CCSD Longhi, Julien; Wigham, Ciara R.;Longhi, Julien; Wigham, Ciara R.;International audience; The CoMeRe project (CoMeRe, 2014) aims to build a kernel corpus of computer-mediated communication (CMC) genres with interactions in the French language. Three key words characterize the project: variety, standards and openness. The project gathered mono- and multimodal, synchronous and asynchronous communication data from both Internet and telecommunication networks (text chat, tweets, SMSs, forums, blogs). A variety of interactions was sought: public or private interactions as well as interactions from informal, learning and professional situations. Whereas some CMC data types were collected within the CoMeRe project, others had previously been collected and structured within different project partners’ local research teams. This meant that the project had to overcome disparities in corpus compilation choices. For this reason, the CoMeRe project structured the corpora in a uniform way using the Text Encoding Initiative format (TEI, Burnard & Bauman, 2013) and decided to describe each corpus using Dublin Core and OLAC standards for metadata (DCMI, 2014; OLAC, 2008). The TEI model was extended in order to encompass the Interaction Space (IS) of CMC multimodal discourse (Chanier et al., 2014). The term ‘openness’ also characterizes the project: The corpora have been released as open data on the French national platform of linguistic resources (ORTOLANG, 2013) in order to pave the way for scientific examination by partners not involved in the project as well as replicative and culumative research. This poster presentation aims to give an overview of the corpus building process using, as a case study, a corpus of political tweets cmr-polititweets (Longhi et al., 2014). The corpus stemmed from a local research project on lexicon (Digital Humanities and datajournalism, supported by the Fondation of Cergy-Pontoise University). It was built starting from seven French politicians from six different political parties. In order to generate political tweets, a set of lists citing these politicians was generated (7087 lists), and lists that have tweeted at least six times and for which the description contained the word ‘politics’ were selected (120 lists in total). Finally, 2934 tweets were recovered. In order to be sure that we selected politicians’ tweets (and not, for example, those of journalists), only the accounts cited in more than 12 lists were considered; 205 politicians were tweeting. We took the last 200 tweets of each of the 205 accounts on 27 March 2014 (34,273 tweets). This allowed us to recover data that focused on the period between the two rounds of the 2014 municipal elections in France. The poster will focus, firstly, on how features specific to Twitter were included and structured in the interaction space TEI model. We will exemplify how features including hashtags that label tweets so that other users can see tweets on the same topic, at signs that allow a user to mention or reply to other users and retweets that allow a user to repost a message from another Twitter user and share it with his own followers, were integrated into the model. Secondly, the poster will evoke some of the ethical and rights issues that had to be considered before publishing a corpus of tweets. Finally, the workflow & multi-stage quality control process adopted during the building of the corpus will be illustrated. This was an essential aspect considering that the corpus underwent format conversions: the local research team had initially structured the corpus in XML whilst the CoMeRe project applied the IS TEI model to the corpus.The political tweets corpus is now structured and available online. Analyses have started to be carried out: some ideas have been launched in Djemili et al. (2014) but further analyses must adhere rigorously to methodologies stemming from the natural language processing (NLP) field.
HAL-ENS-LYON; HAL CY... arrow_drop_down HAL-ENS-LYON; HAL CY Cergy Paris UniversitéOther literature type . Conference object . 2015Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od______3393::748ce273d6fb5a589ffa367a9f1d6dd6&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eudescription Publicationkeyboard_double_arrow_right Part of book or chapter of book 2016 France EnglishHAL CCSD Thierry Chanier; Ciara R. Wigham;Thierry Chanier; Ciara R. Wigham;doi: 10.1075/lsse.2.10cha
International audience; This chapter gives an overview of one possible staged methodology for structuring LCI data by presenting a new scientific object, LEarning and TEaching Corpora (LETEC). Firstly, the chapter clarifies the notion of corpora, used in so many different ways in language studies, and underlines how corpora differ from raw language data. Secondly, using examples taken from actual online learning situations, the chapter illustrates the methodology that is used to collect, transform and organize data from online learning situations in order to make them sharable through open-access repositories. The ethics and rights for releasing a corpus as OpenData are discussed. Thirdly, the authors suggest how the transcription of interactions may become more systematic, and what benefits may be expected from analysis tools, before opening the CALL research perspective applied to LCI towards its applications to teacher-training in Computer-Mediated Communication (CMC), and the common interests the CALL field shares with researchers in the field of Corpus Linguistics working on CMC.
https://edutice.arch... arrow_drop_down https://edutice.archives-ouver...Part of book or chapter of bookLicense: cc-byData sources: UnpayWallHyper Article en LigneOther literature type . Part of book or chapter of book . 2016add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1075/lsse.2.10cha&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu1 citations 1 popularity Average influence Average impulse Average Powered by BIP!
description Publicationkeyboard_double_arrow_right Other literature type , Article 2014 France EnglishHAL CCSD Thierry Chanier; Celine Poudat; Benoit Sagot; Georges Antoniadis; Ciara Wigham; Linda Hriba; Julien Longhi; Djame Seddah;Final version to Special Issue of JLCL (Journal of Language Technology and Computational Linguistics (JLCL, http://jlcl.org/): BUILDING AND ANNOTATING CORPORA OF COMPUTER-MEDIATED DISCOURSE: Issues and Challenges at the Interface of Corpus and Computational Linguistics (ed. by Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel); International audience; The CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective.
HAL CY Cergy Paris U... arrow_drop_down HAL CY Cergy Paris UniversitéOther literature type . 2014Data sources: HAL CY Cergy Paris Universitéadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.21248/jlcl.29.2014.187&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 1visibility views 1 download downloads 0 Powered bydescription Publicationkeyboard_double_arrow_right Other literature type , Conference object 2018 France EnglishHAL CCSD Nathalie Fargier;Nathalie Fargier;International audience; A wide range of initiatives for developing research and data infrastructures have been funded in recent years. There is a growing concern amongst the academic community to maintain the resources invested beyond the period of the original research funding. If technical progress has been made to preserve the data themselves, few thinking and operational solutions exist for the institutions that create, disseminate, curate and preserve the data. How to ensure their existence over the medium or the long-term? This paper is a case study: it addresses the sustainability issues faced by Persée, a French platform dedicated to digitized documentary heritage that was launched in 2003. Through this example, the aim is to present, in practical terms, how an organization has to adapt and to change to sustain over time. Persée tested and combined various mechanisms (technical actions, users’ involvement, organizational evolution, marketing, funding models) with reciprocal influence, to achieve sustainability. Rather than a steady state, ensuring the long term existence of a data infrastructure is an ongoing and resource intensive process.
https://hal.archives... arrow_drop_down EpisciencesOther literature type . Conference object . 2018Hyper Article en Ligne - Sciences de l'Homme et de la SociétéConference object . 2018add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.4000/proceedings.elpub.2018.12&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!