
Reusing research data is an important part of research practice in the social and economic sciences. To find suitable data, researchers need functional search options. However, a comprehensive search for data is hampered by inconsistent or missing semantic indexing because different survey programs use their own terminology for documentation. In most cases, there is no link between the measured theoretical concepts and the variables. From the user's perspective, the fragmentation of data documentation hampers data retrieval and thus limits the research potential of existing data. The challenge, therefore, lies in the concept-oriented indexing of data. Since semantic modelling for content indexing is still lacking, a process and a technology for a uniform semantic indexing of research data are needed. The LORD infrastructure aims to close this gap. The LORDpilot project aimed to test the feasibility of a concept registry for the social sciences. To this end, the pilot project developed a data model and a user-friendly interface to link (i.e., annotate) questions and variables with theoretical concepts for a selection of measurement instruments from three large surveys (ALLBUS, Nacaps, SOEP). We used Semantic Web standards for the technical implementation. By linking the concepts with descriptors from the SKOS-compliant "Thesaurus Social Sciences" (TheSoz), the search in the concept database is supported, and the concept vocabulary is linked to the Linked Open Data (LOD) Cloud. The links were created as RDF triples and made available in a triple store with a SPARQL endpoint. To evaluate our approach, selected measurement instruments of the three surveys were annotated (i.e., questions and variables were described with concepts) by each of the project partners involved, and then the fit between the measurement and the concept was assessed by domain experts. The evaluation of these test annotations shows that (1) the annotations of different annotators show a high degree of agreement, (2) the topical experts predominantly rate the concepts as matching the measurement intention, and (3) conceptual correlations across the data sets become visible via the assigned concepts. However, the analysis also shows non-substantive heterogeneity in the concept vocabulary across annotators. The pilot study has shown that the infrastructure outlined in the application is feasible if the redundancy in the concept vocabulary is limited, e.g. by suggesting appropriate existing terms through algorithmic support during annotation.
Survey Data, Variables, Semantic indexing, controlled vocabularies, Linked Open Data
Survey Data, Variables, Semantic indexing, controlled vocabularies, Linked Open Data
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
