Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2024
License: CC BY
Data sources: ZENODO
ZENODO
Conference object . 2024
License: CC BY
Data sources: Datacite
ZENODO
Conference object . 2024
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Preserving Humanities Research Data: Data Depositing in the TextGrid Repository aka The Fluffy Import

Authors: Buddenbohm, Stefan; Calvo Tello, José; Funk, Stefan E.; Dogaru, George; Klammer, Ralf; Steckel, Alexander; Weimer, Lukas;

Preserving Humanities Research Data: Data Depositing in the TextGrid Repository aka The Fluffy Import

Abstract

If preservation is use, what are the implications for a humanities research data infrastructure? Preserving research data long-term, making it accessible and reusable for the scientific community is a fundamental concern for research infrastructures. This boils down to an alignment of expectations between data depositor and data recipient. For example, the research data infrastructure has requirements relating to format, metadata, responsibilities and licenses. When research data has been successfully deposited/published in a repository, the next pitfall looms: the often unappealing presentation or constrained findability of data. Applying the famous words of John Cotton Dana: If "Preservation is use", the research data infrastructure has to emphasize potential and re-usability of data. The paper introduces the data depositing workflow of the TextGrid Repository (TGRep). The TextGrid Repository TGRep is a pioneer of the Digital Humanities in the German-speaking area. Today, TGRep is part of the Text+ portfolio, the NFDI consortium for language and text-based research data in Germany. Each Text+ data center offers a workflow for incorporating research data that complies to its scope, making it available for reuse. ELTeC ELTeC in TGRep, the European Literary Text Collection, serves as an example for the identification, consultation, ingest, transformation, enrichment, publication and integration in the portfolio of Text+, spelling re-usability and interoperability. ELTeC is a state-of-the-art, open access multilingual collection of corpora containing novels from several European traditions developed for several reasons, among them the development of tools and methods in Computational Literary Studies. Currently, ELTeC contains more than 2000 full-text novels in XML-TEI in 21 languages. They are distributed via multiple platforms (such as GitHub and Zenodo). 1365 full-texts in 15 languages are also published in the TGRep. The Data Depositing Workflow in TGRep As TGRep is of great relevance for Text+ and its community, so is the task of minimizing the effort spent when publishing data there. The solution implemented in Text+ consists of a workflow that automates the creation of the technical files required when importing into the repository, while allowing for as much manual intervention as needed. To use the system, the user interacts with a web-based user interface running inside a Jupyter notebook. After specifying the location of the TEI files to be imported, the data is analyzed in an automated step, which finds and extracts metadata common to all files and makes this available for verification and manual improvement, if necessary. In a subsequent manual step, the user can check and edit the extracted metadata, but also change how the metadata is identified (in which case the previous step can be executed again). In the last step, the technical TGRep metadata files are generated. The new workflow not only improves the data import process, but also serves as a blueprint for further easy-to-build applications that combine libraries and notebooks and rely on the versatile Jupyterlab environment, which can be deployed both locally and in the cloud.

Keywords

NFDI, Humanities, Research Infrastructure, Data Ingest, Repository, Text+, FOS: Humanities, Data Depositing, TextGrid, Fluffy Import, Digital humanities

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green