A pilot study for "Linked Open Research Data" (LORDpilot): a LOD-based Concept Registry for social science research data

Andreas, Daniel; Goebel, Jan; Kern, Dagmar; Klein, Daniel; May, Antonia; Momeni, Fakhri; Nebelin, Jana; Saalbach, Claudia; Siegers, Pascal; Wenzig, Knut; Zapilko, Benjamin

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report . 2024

License: CC BY

Data sources: ZENODO

ZENODO

Report . 2024

License: CC BY

Data sources: Datacite

ZENODO

Report . 2024

License: CC BY

Data sources: Datacite

A pilot study for "Linked Open Research Data" (LORDpilot): a LOD-based Concept Registry for social science research data

JOINT FINAL REPORT TO THE PROJECT (public part)

descriptionPublicationkeyboard_double_arrow_right Report 23 Apr 2024 English Publisher:ZenodoFunded by:DFG | unidentified

Authors: Andreas, Daniel; Goebel, Jan; Kern, Dagmar; Klein, Daniel; May, Antonia; Momeni, Fakhri; Nebelin, Jana; +4 Authors

doi: 10.5281/zenodo.11047523 , 10.5281/zenodo.11047522

A pilot study for "Linked Open Research Data" (LORDpilot): a LOD-based Concept Registry for social science research data

- Summary
- Subjects
- Metrics

Abstract

Reusing research data is an important part of research practice in the social and economic sciences. To find suitable data, researchers need functional search options. However, a comprehensive search for data is hampered by inconsistent or missing semantic indexing because different survey programs use their own terminology for documentation. In most cases, there is no link between the measured theoretical concepts and the variables. From the user's perspective, the fragmentation of data documentation hampers data retrieval and thus limits the research potential of existing data. The challenge, therefore, lies in the concept-oriented indexing of data. Since semantic modelling for content indexing is still lacking, a process and a technology for a uniform semantic indexing of research data are needed. The LORD infrastructure aims to close this gap. The LORDpilot project aimed to test the feasibility of a concept registry for the social sciences. To this end, the pilot project developed a data model and a user-friendly interface to link (i.e., annotate) questions and variables with theoretical concepts for a selection of measurement instruments from three large surveys (ALLBUS, Nacaps, SOEP). We used Semantic Web standards for the technical implementation. By linking the concepts with descriptors from the SKOS-compliant "Thesaurus Social Sciences" (TheSoz), the search in the concept database is supported, and the concept vocabulary is linked to the Linked Open Data (LOD) Cloud. The links were created as RDF triples and made available in a triple store with a SPARQL endpoint. To evaluate our approach, selected measurement instruments of the three surveys were annotated (i.e., questions and variables were described with concepts) by each of the project partners involved, and then the fit between the measurement and the concept was assessed by domain experts. The evaluation of these test annotations shows that (1) the annotations of different annotators show a high degree of agreement, (2) the topical experts predominantly rate the concepts as matching the measurement intention, and (3) conceptual correlations across the data sets become visible via the assigned concepts. However, the analysis also shows non-substantive heterogeneity in the concept vocabulary across annotators. The pilot study has shown that the infrastructure outlined in the application is feasible if the redundancy in the concept vocabulary is limited, e.g. by suggesting appropriate existing terms through algorithmic support during annotation.

Related Organizations

Leibniz Association
Germany
Leibniz Institute for the Social Sciences
Germany
German Institute for Economic Research
Germany

Keywords

Survey Data, Variables, Semantic indexing, controlled vocabularies, Linked Open Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Funded by

DFG| unidentified