Filters
Clear AllLoading
Research data keyboard_double_arrow_right Dataset 2023 Occitan (post 1500); ProvençalZenodo AKA | CorCoDial - Corpus-based ...Authors: Miletic, Aleksandra;Miletic, Aleksandra;{"references": ["Aleksandra Mileti\u0107 and Janine Siewert. 2023. Lemmatization Experiments on Two Low-Resourced Languages: Occitan and Low Saxon. In Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (to appear). Association for Computational Linguistics."]} OcWikiAnnot is a corpus of Wikipedia content in Occitan that is tokenized, PoS-tagged and lemmatized. The corpus contains 100 000 sentences for a total of 2 037 723 tokens. It is based on the Wikipedia corpus in Occitan that is part of the Leipzig Corpora Collection.
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7777339&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 28visibility views 28 download downloads 3 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7777339&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2022 Occitan (post 1500); ProvençalZenodo AKA | CorCoDial - Corpus-based ...Authors: Miletić, Aleksandra; Scherrer, Yves;Miletić, Aleksandra; Scherrer, Yves;OcWikiDisc is a freely available corpus in Occitan, extracted from the talk pages associated with the Occitan Wikipedia. The corpus contains messages posted by users in direct user-to-user interactions as part of the discussions about the content and the editing policies on Wikipedia. The messages are associated with metadata, such as the username, the date and time of the posting, the discussion title, etc. The corpus has also been annotated with tools for automatic language identification, allowing to filter out content in languages other than Occitan. Using different filtering strategies, four versions of the corpus are published (see documentation for more details). The version with the most restrictive filtering contains 8,000 messages for a total of 618,000 tokens, produced by 520 different users. {"references": ["Aleksandra Mileti\u0107 and Yves Scherrer. 2022. OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan. In Proceedings of VarDial - Ninth Workshop on NLP for Similar Languages, Varieties and Dialects. (forthcoming)"]}
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7079579&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 56visibility views 56 download downloads 6 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7079579&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2022 Occitan (post 1500); ProvençalInstitut für Sprachwissenschaft, Karl-Franzens-Universität Graz Authors: Melchior, Luca;Melchior, Luca;handle: 11471/518.10.1.1909
add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11471/518.10.1.1909&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11471/518.10.1.1909&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2020 Occitan (post 1500); ProvençalZenodo Miletic, Aleksandra; Bras, Myriam; Esher, Louise; Clamença Poujade; Sibille, Jean; Vergez-Couret, Marianne;Linguatec Tolosa Treebank for Occitan Linguatec Tolosa Treebank is the first dependency treebank for Occitan, developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds. The current version of the treebank contains 13K tokens annotated for PoS tags, lemmas and syntactic dependencies. Linguistic annotation follows Universal Dependencies guidelines (https://universaldependencies.org/#language-u). A detailed corpus description is provided in the description file. A subset of texts was doubly annotated and these annotations were adjudicated in order to provide the final annotation. These texts are therefore the most suited to be used as test files in NLP experiments. The corpus files are stored in the ConLL-U format. Each sentence is preceded by a sentence ID and the original, non-tokenized text of the sentence. The annotation is provided in a column-based format defined as follows: 1. ID: Word index, integer starting at 1 for each new sentence; may be a range for multiword tokens. 2. FORM: Word form or punctuation symbol. 3. LEMMA: Lemma or stem of word form. 4. UPOS: Universal part-of-speech tag. 5. XPOS: Language-specific part-of-speech tag; underscore if not available. 6. FEATS: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available. 7. HEAD: Head of the current word, which is either a value of ID or zero (0). 8. DEPREL: Universal dependency relation to the HEAD 9. DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs. 10. MISC: Any other annotation. The texts are distributed under the Creative Commons BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en). This corpus is developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds.
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.3708268&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 224visibility views 224 download downloads 162 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.3708268&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2018 Occitan (post 1500); ProvençalZenodo EC | EXPRESSIONARRATIONAuthors: Marianne Vergez-Couret;Marianne Vergez-Couret;This resource contains 5 extracts of texts in Occitan which were manually annotated with lemmas and parts-of-speech, following the Grace standard. It was produced during the ExpressioNarration project, funded by a Marie Curie Individual Fellowship, in order to evaluate the performance of an Occitan Part-Of-Speech tagger, Talismane, to the specifities of the corpus of the project called Oral Occitan (OcOr), also available on https://zenodo.org/record/1451753#.W78FJWOYSpo. Each extract contains around 1500 words. They are extracted from 'Contes et proverbes populaires recueillis en armagnac et Contes populaires recueillis en agenais' de J.-F. Bladé, 'Coundes biarnés, couéilhuts aüs parsàas miéytadès dou péys dé Biarn' de J.-V. Lalanne, 'Contes populaires du Languedoc' de L. Lambert and 'Contes populaires recueillis dans la Grande-Lande' de F. Arnaudin. The annotation process is described in the following article available on https://www.openscience.fr/IMG/pdf/iste_modocv1n1_2.pdf. {"references": ["Vergez-Couret M. (2017). \u00ab Constitution et annotation d'un corpus \u00e9crit de contes et r\u00e9cits en occitan \u00bb, Analyses et m\u00e9thodes formelles pour les humanit\u00e9s num\u00e9riques, ISTE OpenScience, 1-1, publication en ligne : https://www.openscience.fr/Constitution-et-annotation-d-un-corpus-ecrit-de-contes-et-recits-en-occitan."]}
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1456563&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 198visibility views 198 download downloads 8 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1456563&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2018 Occitan (post 1500); ProvençalZenodo EC | EXPRESSIONARRATIONAuthors: Vergez-Couret, Marianne; Carruthers, Janice;Vergez-Couret, Marianne; Carruthers, Janice;OcOr is a corpus of Occitan oral narratives. This corpus is one of the outputs of the project ExpressioNarration, financed by a Marie Sklodovska Curie Fellowship (2016-2018, n°655034). It includes three sub-corpora, constituted as follows: • OOT (Occitan, oral, traditional): stories drawn from fieldwork among native speakers in the Occitan domain, recorded by the COMDT (Conservatoire Occitan des Musiques et Danses Traditionnelles - http://www.comdt.org/), transcribed and digitised for the project by the researchers. • OWT (Occitan, written, traditional): published literary stories, digitised by and for the project by the researchers. These are stories collected from oral sources and produced in a publishable written version. • OOC (Occitan, oral, contemporary): stories recounted by contemporary artists, taken from existing recordings and two Toulouse storytelling events organised by the project in collaboration with the Institut d'Etudes Occitanes (IEO), in 2016. The stories were recorded during the events and subsequently transcribed and digitised by the researchers. The overall aim of the ExpressioNarration project was to use contemporary linguistic theory to explore the relationship between language and orality, with a specific focus on key temporal features of oral narrative in Occitan, including ‘tenses’, ‘connectives' and 'frame introducers'. These features were thus annotated in the three sub-corpora. All the sub-corpora are disseminated in XML format (TEI-P5) and PDF. Each story is available as an annotated XML document, an annotated PDF and a stripped PDF document. Full metadata appears in the Header of each XML document, with information on speakers (e.g. gender, age, place of origin, education, languages spoken), variety of Occitan (or dialect), authors/editorial information (in the case of OWT) and story-type when relevant (i.e. the Aarne Thompson category). For each sub-corpus, a user-friendly summary of this metadata is also available in an Excel spreadsheet: these are contained in the OcOr zipfile. The annotation system was designed by the researchers and is given in full in the Header of each XML document. For further information on the constitution of the corpus and discussion of the theoretical and methodological issues relating to data collection, digitisation and annotation, please read the following article in the journal Corpus, written by the researchers and entitled ‘Méthodologie pour la constitution d’un corpus comparatif de narration orale en Occitan : objectifs, défis, solutions’, available at: https://journals.openedition.org/corpus/3490. {"references": ["Janice Carruthers et Marianne Vergez-Couret, \u00ab M\u00e9thodologie pour la constitution d'un corpus comparatif de narration orale en Occitan : objectifs, d\u00e9fis, solutions \u00bb, Corpus [En ligne], 18 | 2018, mis en ligne le 09 juillet 2018, consult\u00e9 le 08 octobre 2018. URL : http://journals.openedition.org/corpus/3490", "Vergez-Couret M. (2017). \u00ab Constitution et annotation d'un corpus \u00e9crit de contes et r\u00e9cits en occitan \u00bb, Analyses et m\u00e9thodes formelles pour les humanit\u00e9s num\u00e9riques, ISTE OpenScience, 1-1, publication en ligne : https://www.openscience.fr/Constitution-et-annotation-d-un-corpus-ecrit-de-contes-et-recits-en-occitan."]}
add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.4740659&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu1 citations 1 popularity Average influence Average impulse Average Powered by BIP!
visibility 905visibility views 905 download downloads 64 Powered bymore_vert add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.4740659&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2018 Occitan (post 1500); ProvençalZenodo Authors: Bras, Myriam; Esher, Louise; Sibille, Jean; Vergez-Couret, Marianne;Bras, Myriam; Esher, Louise; Sibille, Jean; Vergez-Couret, Marianne;This corpus contains a collection of texts in Occitan which were manually annotated with parts-of-speech, lemmas. The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 28 documents and 12,425 tokens. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806 The annotated versions are provided in a TSV CoNLL-U format.
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1182949&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 396visibility views 396 download downloads 52 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1182949&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset Occitan (post 1500); ProvençalZenodo AKA | CorCoDial - Corpus-based ...Authors: Miletić, Aleksandra; Scherrer, Yves;Miletić, Aleksandra; Scherrer, Yves;OcWikiDisc is a freely available corpus in Occitan, extracted from the talk pages associated with the Occitan Wikipedia. The corpus contains messages posted by users in direct user-to-user interactions as part of the discussions about the content and the editing policies on Wikipedia. The messages are associated with metadata, such as the username, the date and time of the posting, the discussion title, etc. The corpus has also been annotated with tools for automatic language identification, allowing to filter out content in languages other than Occitan. Using different filtering strategies, four versions of the corpus are published (see documentation for more details). The version with the most restrictive filtering contains 8,000 messages for a total of 618,000 tokens, produced by 520 different users.
add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7079580&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7079580&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset Occitan (post 1500); ProvençalZenodo AKA | CorCoDial - Corpus-based ...Authors: Miletic, Aleksandra;Miletic, Aleksandra;OcWikiAnnot is a corpus of Wikipedia content in Occitan that is tokenized, PoS-tagged and lemmatized. The corpus contains 100 000 sentences for a total of 2 037 723 tokens. It is based on the Wikipedia corpus in Occitan that is part of the Leipzig Corpora Collection.
add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7777340&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7777340&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu
Loading
Research data keyboard_double_arrow_right Dataset 2023 Occitan (post 1500); ProvençalZenodo AKA | CorCoDial - Corpus-based ...Authors: Miletic, Aleksandra;Miletic, Aleksandra;{"references": ["Aleksandra Mileti\u0107 and Janine Siewert. 2023. Lemmatization Experiments on Two Low-Resourced Languages: Occitan and Low Saxon. In Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (to appear). Association for Computational Linguistics."]} OcWikiAnnot is a corpus of Wikipedia content in Occitan that is tokenized, PoS-tagged and lemmatized. The corpus contains 100 000 sentences for a total of 2 037 723 tokens. It is based on the Wikipedia corpus in Occitan that is part of the Leipzig Corpora Collection.
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7777339&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 28visibility views 28 download downloads 3 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7777339&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2022 Occitan (post 1500); ProvençalZenodo AKA | CorCoDial - Corpus-based ...Authors: Miletić, Aleksandra; Scherrer, Yves;Miletić, Aleksandra; Scherrer, Yves;OcWikiDisc is a freely available corpus in Occitan, extracted from the talk pages associated with the Occitan Wikipedia. The corpus contains messages posted by users in direct user-to-user interactions as part of the discussions about the content and the editing policies on Wikipedia. The messages are associated with metadata, such as the username, the date and time of the posting, the discussion title, etc. The corpus has also been annotated with tools for automatic language identification, allowing to filter out content in languages other than Occitan. Using different filtering strategies, four versions of the corpus are published (see documentation for more details). The version with the most restrictive filtering contains 8,000 messages for a total of 618,000 tokens, produced by 520 different users. {"references": ["Aleksandra Mileti\u0107 and Yves Scherrer. 2022. OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan. In Proceedings of VarDial - Ninth Workshop on NLP for Similar Languages, Varieties and Dialects. (forthcoming)"]}
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7079579&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 56visibility views 56 download downloads 6 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7079579&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2022 Occitan (post 1500); ProvençalInstitut für Sprachwissenschaft, Karl-Franzens-Universität Graz Authors: Melchior, Luca;Melchior, Luca;handle: 11471/518.10.1.1909
add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11471/518.10.1.1909&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11471/518.10.1.1909&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2020 Occitan (post 1500); ProvençalZenodo Miletic, Aleksandra; Bras, Myriam; Esher, Louise; Clamença Poujade; Sibille, Jean; Vergez-Couret, Marianne;Linguatec Tolosa Treebank for Occitan Linguatec Tolosa Treebank is the first dependency treebank for Occitan, developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds. The current version of the treebank contains 13K tokens annotated for PoS tags, lemmas and syntactic dependencies. Linguistic annotation follows Universal Dependencies guidelines (https://universaldependencies.org/#language-u). A detailed corpus description is provided in the description file. A subset of texts was doubly annotated and these annotations were adjudicated in order to provide the final annotation. These texts are therefore the most suited to be used as test files in NLP experiments. The corpus files are stored in the ConLL-U format. Each sentence is preceded by a sentence ID and the original, non-tokenized text of the sentence. The annotation is provided in a column-based format defined as follows: 1. ID: Word index, integer starting at 1 for each new sentence; may be a range for multiword tokens. 2. FORM: Word form or punctuation symbol. 3. LEMMA: Lemma or stem of word form. 4. UPOS: Universal part-of-speech tag. 5. XPOS: Language-specific part-of-speech tag; underscore if not available. 6. FEATS: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available. 7. HEAD: Head of the current word, which is either a value of ID or zero (0). 8. DEPREL: Universal dependency relation to the HEAD 9. DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs. 10. MISC: Any other annotation. The texts are distributed under the Creative Commons BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en). This corpus is developed as part of the EFA 227/16 LINGUATEC Project, financed by the POCTEFA Interreg European funds.
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.3708268&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 224visibility views 224 download downloads 162 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.3708268&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2018 Occitan (post 1500); ProvençalZenodo EC | EXPRESSIONARRATIONAuthors: Marianne Vergez-Couret;Marianne Vergez-Couret;This resource contains 5 extracts of texts in Occitan which were manually annotated with lemmas and parts-of-speech, following the Grace standard. It was produced during the ExpressioNarration project, funded by a Marie Curie Individual Fellowship, in order to evaluate the performance of an Occitan Part-Of-Speech tagger, Talismane, to the specifities of the corpus of the project called Oral Occitan (OcOr), also available on https://zenodo.org/record/1451753#.W78FJWOYSpo. Each extract contains around 1500 words. They are extracted from 'Contes et proverbes populaires recueillis en armagnac et Contes populaires recueillis en agenais' de J.-F. Bladé, 'Coundes biarnés, couéilhuts aüs parsàas miéytadès dou péys dé Biarn' de J.-V. Lalanne, 'Contes populaires du Languedoc' de L. Lambert and 'Contes populaires recueillis dans la Grande-Lande' de F. Arnaudin. The annotation process is described in the following article available on https://www.openscience.fr/IMG/pdf/iste_modocv1n1_2.pdf. {"references": ["Vergez-Couret M. (2017). \u00ab Constitution et annotation d'un corpus \u00e9crit de contes et r\u00e9cits en occitan \u00bb, Analyses et m\u00e9thodes formelles pour les humanit\u00e9s num\u00e9riques, ISTE OpenScience, 1-1, publication en ligne : https://www.openscience.fr/Constitution-et-annotation-d-un-corpus-ecrit-de-contes-et-recits-en-occitan."]}
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1456563&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 198visibility views 198 download downloads 8 Powered bymore_vert ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1456563&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2018 Occitan (post 1500); ProvençalZenodo EC | EXPRESSIONARRATIONAuthors: Vergez-Couret, Marianne; Carruthers, Janice;Vergez-Couret, Marianne; Carruthers, Janice;OcOr is a corpus of Occitan oral narratives. This corpus is one of the outputs of the project ExpressioNarration, financed by a Marie Sklodovska Curie Fellowship (2016-2018, n°655034). It includes three sub-corpora, constituted as follows: • OOT (Occitan, oral, traditional): stories drawn from fieldwork among native speakers in the Occitan domain, recorded by the COMDT (Conservatoire Occitan des Musiques et Danses Traditionnelles - http://www.comdt.org/), transcribed and digitised for the project by the researchers. • OWT (Occitan, written, traditional): published literary stories, digitised by and for the project by the researchers. These are stories collected from oral sources and produced in a publishable written version. • OOC (Occitan, oral, contemporary): stories recounted by contemporary artists, taken from existing recordings and two Toulouse storytelling events organised by the project in collaboration with the Institut d'Etudes Occitanes (IEO), in 2016. The stories were recorded during the events and subsequently transcribed and digitised by the researchers. The overall aim of the ExpressioNarration project was to use contemporary linguistic theory to explore the relationship between language and orality, with a specific focus on key temporal features of oral narrative in Occitan, including ‘tenses’, ‘connectives' and 'frame introducers'. These features were thus annotated in the three sub-corpora. All the sub-corpora are disseminated in XML format (TEI-P5) and PDF. Each story is available as an annotated XML document, an annotated PDF and a stripped PDF document. Full metadata appears in the Header of each XML document, with information on speakers (e.g. gender, age, place of origin, education, languages spoken), variety of Occitan (or dialect), authors/editorial information (in the case of OWT) and story-type when relevant (i.e. the Aarne Thompson category). For each sub-corpus, a user-friendly summary of this metadata is also available in an Excel spreadsheet: these are contained in the OcOr zipfile. The annotation system was designed by the researchers and is given in full in the Header of each XML document. For further information on the constitution of the corpus and discussion of the theoretical and methodological issues relating to data collection, digitisation and annotation, please read the following article in the journal Corpus, written by the researchers and entitled ‘Méthodologie pour la constitution d’un corpus comparatif de narration orale en Occitan : objectifs, défis, solutions’, available at: https://journals.openedition.org/corpus/3490. {"references": ["Janice Carruthers et Marianne Vergez-Couret, \u00ab M\u00e9thodologie pour la constitution d'un corpus comparatif de narration orale en Occitan : objectifs, d\u00e9fis, solutions \u00bb, Corpus [En ligne], 18 | 2018, mis en ligne le 09 juillet 2018, consult\u00e9 le 08 octobre 2018. URL : http://journals.openedition.org/corpus/3490", "Vergez-Couret M. (2017). \u00ab Constitution et annotation d'un corpus \u00e9crit de contes et r\u00e9cits en occitan \u00bb, Analyses et m\u00e9thodes formelles pour les humanit\u00e9s num\u00e9riques, ISTE OpenScience, 1-1, publication en ligne : https://www.openscience.fr/Constitution-et-annotation-d-un-corpus-ecrit-de-contes-et-recits-en-occitan."]}
add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.4740659&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu1 citations 1 popularity Average influence Average impulse Average Powered by BIP!
visibility 905visibility views 905 download downloads 64 Powered bymore_vert add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.4740659&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euResearch data keyboard_double_arrow_right Dataset 2018 Occitan (post 1500); ProvençalZenodo Authors: Bras, Myriam; Esher, Louise; Sibille, Jean; Vergez-Couret, Marianne;Bras, Myriam; Esher, Louise; Sibille, Jean; Vergez-Couret, Marianne;This corpus contains a collection of texts in Occitan which were manually annotated with parts-of-speech, lemmas. The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 28 documents and 12,425 tokens. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806 The annotated versions are provided in a TSV CoNLL-U format.
ZENODO arrow_drop_down add ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.1182949&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 396visibility views 396 download downloads 52 Powered by