The Middle Hungarian period, i.e. the interval time between the second third of the sixteenth century and the second third of the eighteenth century is less intensively explored so far. This also is the earliest period of the history of Hungarian for which an appropriate amount of extant text material is at our disposal for studying the language use of everyday private life with the necessary thoroughness (cf. Dömötör–Gugán–Varga 2021). The present proposal focuses on two databases designed by the presenters and their team: The Old and Middle Hungarian corpus of informal language use (Történeti Magánéleti Korpusz, TMK) and The corpus of memoirs and dramas (Középmagyar emlékirat- és drámakorpusz). Both of the corpora contain texts representing important sources of the cultural heritage of Hungarian: ego-documents from noblemen and noblewomen, genres related to everyday language use involving speakers with lower social status as well, and constructed dialogs imitating everyday language use in fiction. The Old and Middle Hungarian corpus of informal language use (tmk.nytud.hu) consists of private letters and records of witch trials from between the fifteenth-century beginnings and 1772, a total of 8 million characters. This presentation highlights some requirements and steps of the corpus building executed by the historical linguists in a collaboration with the computational linguist. It includes the manual normalization and disambiguation for diachronic adequacy, the morphological analysis and query interface. This database is the first fully normalized and annotated historical corpus of Hungarian completed with sociolinguistic information (Novák–Gugán–Varga–Dömötör 2018). The other topic of the presentation is The corpus of memoirs and dramas, the building of which is in progress following the guidelines developed for the previous corpus (cf. Gugán 2020). The language use of memoires and dramas in Middle Hungarian proved to be suitable as an extension to the more directly speech-related sources of TMK. Memoires are ego-documents, yet they are still farther from informal language use than private letters. Dramas are constructed texts, however, they are speech-purposed as well. Therefore, the four registers to be included all share certain characteristics, but each differs in at least one feature. In both corpora, all of the records are normalized and morphologically annotated. The new corpus is also planned to get a freely available user-friendly query interface, providing a valuable source of information for historical linguists and specialists or students of related fields.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
pmid: 35125984
pmc: PMC8807381
AbstractThis paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-021-09574-0&type=result"></script>');
-->
</script>
Green | |
hybrid |
citations | 20 | |
popularity | Top 10% | |
influence | Top 10% | |
impulse | Top 10% |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-021-09574-0&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______951::a86d5645fb51bde8f7086ccd94c3a8d5&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______951::a86d5645fb51bde8f7086ccd94c3a8d5&type=result"></script>');
-->
</script>
handle: 20.500.14243/483041
Abstract The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.21203/rs.3.rs-4176128/v1&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.21203/rs.3.rs-4176128/v1&type=result"></script>');
-->
</script>
In our workshop, we invite Digital Humanists to explore the OpenMethods metablog as an innovative publication forum and to strengthen the representation of traditionally underrepresented languages and actors in Digital Humanities - particularly non-Anglophone, under-resourced languages (such as languages with non-Latin scripts) or female tool-makers - on the platform in particular and in the Digital Humanities discourse in general.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8108122&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8108122&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s42803-019-00011-x&type=result"></script>');
-->
</script>
hybrid |
citations | 8 | |
popularity | Top 10% | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s42803-019-00011-x&type=result"></script>');
-->
</script>
handle: 2117/97340 , 20.500.14243/317422 , 10138/176911 , 10067/1277890151162165141
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014 ; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-015-9333-4&type=result"></script>');
-->
</script>
Green | |
hybrid |
citations | 4 | |
popularity | Average | |
influence | Top 10% | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-015-9333-4&type=result"></script>');
-->
</script>
OpenAIRE research community dashboard for DARIAH-EU
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.3254777&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.3254777&type=result"></script>');
-->
</script>
Eötvös Loránd University
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.4746235&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.4746235&type=result"></script>');
-->
</script>
handle: 10831/35075
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10831/35075&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10831/35075&type=result"></script>');
-->
</script>
The Middle Hungarian period, i.e. the interval time between the second third of the sixteenth century and the second third of the eighteenth century is less intensively explored so far. This also is the earliest period of the history of Hungarian for which an appropriate amount of extant text material is at our disposal for studying the language use of everyday private life with the necessary thoroughness (cf. Dömötör–Gugán–Varga 2021). The present proposal focuses on two databases designed by the presenters and their team: The Old and Middle Hungarian corpus of informal language use (Történeti Magánéleti Korpusz, TMK) and The corpus of memoirs and dramas (Középmagyar emlékirat- és drámakorpusz). Both of the corpora contain texts representing important sources of the cultural heritage of Hungarian: ego-documents from noblemen and noblewomen, genres related to everyday language use involving speakers with lower social status as well, and constructed dialogs imitating everyday language use in fiction. The Old and Middle Hungarian corpus of informal language use (tmk.nytud.hu) consists of private letters and records of witch trials from between the fifteenth-century beginnings and 1772, a total of 8 million characters. This presentation highlights some requirements and steps of the corpus building executed by the historical linguists in a collaboration with the computational linguist. It includes the manual normalization and disambiguation for diachronic adequacy, the morphological analysis and query interface. This database is the first fully normalized and annotated historical corpus of Hungarian completed with sociolinguistic information (Novák–Gugán–Varga–Dömötör 2018). The other topic of the presentation is The corpus of memoirs and dramas, the building of which is in progress following the guidelines developed for the previous corpus (cf. Gugán 2020). The language use of memoires and dramas in Middle Hungarian proved to be suitable as an extension to the more directly speech-related sources of TMK. Memoires are ego-documents, yet they are still farther from informal language use than private letters. Dramas are constructed texts, however, they are speech-purposed as well. Therefore, the four registers to be included all share certain characteristics, but each differs in at least one feature. In both corpora, all of the records are normalized and morphologically annotated. The new corpus is also planned to get a freely available user-friendly query interface, providing a valuable source of information for historical linguists and specialists or students of related fields.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
pmid: 35125984
pmc: PMC8807381
AbstractThis paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.