The Middle Hungarian period, i.e. the interval time between the second third of the sixteenth century and the second third of the eighteenth century is less intensively explored so far. This also is the earliest period of the history of Hungarian for which an appropriate amount of extant text material is at our disposal for studying the language use of everyday private life with the necessary thoroughness (cf. Dömötör–Gugán–Varga 2021). The present proposal focuses on two databases designed by the presenters and their team: The Old and Middle Hungarian corpus of informal language use (Történeti Magánéleti Korpusz, TMK) and The corpus of memoirs and dramas (Középmagyar emlékirat- és drámakorpusz). Both of the corpora contain texts representing important sources of the cultural heritage of Hungarian: ego-documents from noblemen and noblewomen, genres related to everyday language use involving speakers with lower social status as well, and constructed dialogs imitating everyday language use in fiction. The Old and Middle Hungarian corpus of informal language use (tmk.nytud.hu) consists of private letters and records of witch trials from between the fifteenth-century beginnings and 1772, a total of 8 million characters. This presentation highlights some requirements and steps of the corpus building executed by the historical linguists in a collaboration with the computational linguist. It includes the manual normalization and disambiguation for diachronic adequacy, the morphological analysis and query interface. This database is the first fully normalized and annotated historical corpus of Hungarian completed with sociolinguistic information (Novák–Gugán–Varga–Dömötör 2018). The other topic of the presentation is The corpus of memoirs and dramas, the building of which is in progress following the guidelines developed for the previous corpus (cf. Gugán 2020). The language use of memoires and dramas in Middle Hungarian proved to be suitable as an extension to the more directly speech-related sources of TMK. Memoires are ego-documents, yet they are still farther from informal language use than private letters. Dramas are constructed texts, however, they are speech-purposed as well. Therefore, the four registers to be included all share certain characteristics, but each differs in at least one feature. In both corpora, all of the records are normalized and morphologically annotated. The new corpus is also planned to get a freely available user-friendly query interface, providing a valuable source of information for historical linguists and specialists or students of related fields.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
pmid: 35125984
pmc: PMC8807381
AbstractThis paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-021-09574-0&type=result"></script>');
-->
</script>
Green | |
hybrid |
citations | 22 | |
popularity | Top 10% | |
influence | Top 10% | |
impulse | Top 10% |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-021-09574-0&type=result"></script>');
-->
</script>
handle: 20.500.14243/483041
Abstract The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.21203/rs.3.rs-4176128/v1&type=result"></script>');
-->
</script>
Green | |
hybrid |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.21203/rs.3.rs-4176128/v1&type=result"></script>');
-->
</script>
New software paradigm for linguistic/phonetic tools: webservices.Webservices as building blocks for complex systems.BAS CLARIN webservices: a free service to the scientific community.Multilingual automatic segmentation and labelling of speech into words and phones.Multilingual automatic text-to-phoneme conversion webservice. A new software paradigm `Software as a Service' based on web services is proposed for multilingual linguistic tools and exemplified with the BAS CLARIN web services. Instead of traditional tool development and distribution the tool functionality is implemented on a highly available server that users or applications access via HTTP requests. As examples we describe in detail five multilingual web services for speech science operational since 2012 and discuss the benefits and drawbacks of the new paradigm as well as our experiences with user acceptance and implementation problems. The services include automatic segmentation of speech, grapheme-to-phoneme conversion, syllabification, speech synthesis, and optimal symbol sequence alignment.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1016/j.csl.2017.01.005&type=result"></script>');
-->
</script>
bronze |
citations | 240 | |
popularity | Top 1% | |
influence | Top 1% | |
impulse | Top 1% |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1016/j.csl.2017.01.005&type=result"></script>');
-->
</script>
This document expands on and provides nuance to some of the concepts defined in the EOSC Interoperability Framework report from the EOSC Executive Board Working Groups (WG) FAIR and Architecture published in 2021 and the conceptualisation of an EOSC Interoperability Framework that it embodies (EOSC-IF). It accounts for a deep-dive into the landscape of semantic interoperability implementations and a wide range of interoperability scenarios focused around the Semantic Interoperability Specification, some subtypes of Semantic Business Objects, as well as the Semantic Artefact Catalogue and Mapping Repository. A small set of new concepts of relevance to this work and to EOSC at large have also been added. The introduction provides context to the creation of this report, the basic concepts section provides and overview of the related components of the EOSC-IF, and the following four sections summarise explorations that frame the concluding set of recommendations to the EOSC community at large. The explorations that frame the recommendations are titled as follows: The Semantic Interoperability Specification: Implementation profiles for communities The Semantic Artefact Catalogue: Twelve maturity dimensions The Mapping Repository: Making a case for FAIR mappings and crosswalks Implementation examples: Common use cases and real-world case studies The recommendations themselves are organised under the following five broad categories: Align emerging adaptations and implementations to the Semantic View of the EOSC-IF (pp. 39–42) reference architecture. Identify and consolidate different approaches to representing and exchanging (meta)data with the FAIR Digital Objects model described in the EOSC-IF (pp. 29–34). Extend the EOSC-IF to include a research process perspective that can support convergence on solutions for common use cases. Extend the set of Semantic Business Objects described in the EOSC-IF (pp. 40–41) to include artefacts such as mappings and crosswalks. Recognise Semantic Artefact Catalogue component described in the EOSC-IF (p. 42) as a critical part of the long-term viability of any research data infrastructure. This is a report of the EOSC Association’s Task Force Semantic Interoperability (2021–2023). The document was developed in continuous consultation with the task force membership (September 2023–March 2024), where a subgroup of the membership actively contributed to authoring the text. The document was submitted to EOSC Association’s Quality Review Committee (QRC) and an open community consultation on 18 January. The response to the reviewer’s comments was submitted on 12 March and the current version was approved with minor revisions on 27 March. Read more about the EOSC Association, the role of its task forces and the task forces’ membership on the eosc.eu website.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10518859&type=result"></script>');
-->
</script>
Green |
citations | 1 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10518859&type=result"></script>');
-->
</script>
doi: 10.32458/ei.24.4
Knjižnica Etnografskoga muzeja je specijalna muzejska knjižnica, koja je, kao i većina specijalnih i muzejskih knjižnica, poluotvorenoga tipa i prvenstveno namijenjena dje- latnicima Muzeja: kustosima, restauratorima i preparatorima. Formirana ubrzo nakon osnivanja muzeja, rasla je zajedno s muzejskim fundusom. U radu se opisuje njezina povijest, važniji fond i obrada građe te iznose neki problemi
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.32458/ei.24.4&type=result"></script>');
-->
</script>
gold |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.32458/ei.24.4&type=result"></script>');
-->
</script>
doi: 10.11647/obp.0192.10
This chapter explores some of the aspects underlying the domain-specific, epistemic processes that pose challenges to the FAIRification of knowledge creation in arts and humanities. Tóth-Czifra argues that the FAIR principles (findability, accessibility, interoperability, and reusability) have been designed according to underlying assumptions about how knowledge creation operates and communicates. This causes issues in productive reuse of digitised cultural heritage resources and legal barriers can prevent institutions from sharing metadata online, which can further skew research towards what is easily available and free to find online. However, standardisation of shared metadata can also have epistemological challenges and affect the systems of discovery and knowledge creation — a price which Tóth-Czifra argues is too high. She argues that in order to be truly reusable, data should achieve autonomy from their curator, and by bringing scholarly communication, data sharing and academic publishing together, we can reach a more sustainable research data management ecosystem. Relying on domain-relevant community standards as well as increasing the social life of data is critical to avoid having deposited datasets being buried in isolated ‘data tombs’.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.11647/obp.0192.10&type=result"></script>');
-->
</script>
hybrid |
citations | 1 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.11647/obp.0192.10&type=result"></script>');
-->
</script>
In our workshop, we invite Digital Humanists to explore the OpenMethods metablog as an innovative publication forum and to strengthen the representation of traditionally underrepresented languages and actors in Digital Humanities - particularly non-Anglophone, under-resourced languages (such as languages with non-Latin scripts) or female tool-makers - on the platform in particular and in the Digital Humanities discourse in general.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8108122&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8108122&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s42803-019-00011-x&type=result"></script>');
-->
</script>
hybrid |
citations | 8 | |
popularity | Top 10% | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s42803-019-00011-x&type=result"></script>');
-->
</script>
handle: 2117/97340 , 20.500.14243/317422 , 10138/176911 , 10067/1277890151162165141
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014 ; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-015-9333-4&type=result"></script>');
-->
</script>
Green | |
hybrid |
citations | 5 | |
popularity | Average | |
influence | Top 10% | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1007/s10579-015-9333-4&type=result"></script>');
-->
</script>
The Middle Hungarian period, i.e. the interval time between the second third of the sixteenth century and the second third of the eighteenth century is less intensively explored so far. This also is the earliest period of the history of Hungarian for which an appropriate amount of extant text material is at our disposal for studying the language use of everyday private life with the necessary thoroughness (cf. Dömötör–Gugán–Varga 2021). The present proposal focuses on two databases designed by the presenters and their team: The Old and Middle Hungarian corpus of informal language use (Történeti Magánéleti Korpusz, TMK) and The corpus of memoirs and dramas (Középmagyar emlékirat- és drámakorpusz). Both of the corpora contain texts representing important sources of the cultural heritage of Hungarian: ego-documents from noblemen and noblewomen, genres related to everyday language use involving speakers with lower social status as well, and constructed dialogs imitating everyday language use in fiction. The Old and Middle Hungarian corpus of informal language use (tmk.nytud.hu) consists of private letters and records of witch trials from between the fifteenth-century beginnings and 1772, a total of 8 million characters. This presentation highlights some requirements and steps of the corpus building executed by the historical linguists in a collaboration with the computational linguist. It includes the manual normalization and disambiguation for diachronic adequacy, the morphological analysis and query interface. This database is the first fully normalized and annotated historical corpus of Hungarian completed with sociolinguistic information (Novák–Gugán–Varga–Dömötör 2018). The other topic of the presentation is The corpus of memoirs and dramas, the building of which is in progress following the guidelines developed for the previous corpus (cf. Gugán 2020). The language use of memoires and dramas in Middle Hungarian proved to be suitable as an extension to the more directly speech-related sources of TMK. Memoires are ego-documents, yet they are still farther from informal language use than private letters. Dramas are constructed texts, however, they are speech-purposed as well. Therefore, the four registers to be included all share certain characteristics, but each differs in at least one feature. In both corpora, all of the records are normalized and morphologically annotated. The new corpus is also planned to get a freely available user-friendly query interface, providing a valuable source of information for historical linguists and specialists or students of related fields.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8010452&type=result"></script>');
-->
</script>
pmid: 35125984
pmc: PMC8807381
AbstractThis paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.