Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2026
Data sources: ZENODO
ZENODO
Software . 2026
Data sources: Datacite
ZENODO
Software . 2026
Data sources: Datacite
versions View all 2 versions
addClaim

Scripts and mappings to harmonise dementia cohorts into the OMOP Common Data Model

Authors: da Costa Mateus, Pedro; Moonen, Justine; Beran, Magdalena; Jaarsma, Eva; Van der Landen, Sophie; Heuvelink, Joost; Mahlet, Birhanu; +10 Authors

Scripts and mappings to harmonise dementia cohorts into the OMOP Common Data Model

Abstract

This resource contains the scripts, mappings, and other materials that were used to harmonise data from nine Dutch cohorts relevant to dementia research to the OMOP (Observational Medical Outcomes Partnership) Common Data Model (CDM). These materials were developed within the Netherlands Consortium of Dementia Cohorts (NCDC) project. NCDC included the following nine cohorts: Amsterdam Dementia Cohort, Doetinchem Cohort Study, EMIF-AD 90+ study, EMIF-AD PreclinAD study, Leiden Longevity Study, Longitudinal Aging Study Amsterdam, The Maastricht Study, Rotterdam Study, and SMART. Scope The materials in this publication were used to transform the data from the NCDC cohorts into the OMOP CDM. However, the overall approach and workflow are transferable to other cohort harmonisation efforts. As such, this work can serve as a reference implementation or starting point for the harmonisation of cohort data to any common data model. Harmonising data to a common data model is crucial in (dementia) research as it allows to combine studies, increase statistical power, and enhance the reliability of findings. Reuse of this resource is governed by the license specified in this Zenodo record and the associated GitHub repository. Who is it for? This work is primarily intended for researchers and (research) software engineers looking to harmonise multi-centre cohort data into a common data model. Effective reuse of this resource requires technical expertise in data transformation, familiarity with common data models, and in-depth domain knowledge of the underlying cohort data. It is also useful for researchers who want to work with data that are structured according to the OMOP CDM, or with data from (one of) the nine cohorts that were part of NCDC. What does it include? The resource includes materials covering all steps required to transform cohort data into the OMOP CDM, including the design of mappings and the execution of the ETL (Extract, Transform, Load) process. The first step in the harmonisation process is the collection of metadata. Variables from all cohort studies are identified, after which a set of harmonised variables is defined and mapped to OMOP concepts (the destination mapping). The variables of each individual cohort are then mapped to these harmonised variables (the source mapping). The folder examples/ contains an example destination mapping, source mapping, and dataset, and can be used to guide users in this initial harmonisation step The folder ncdc_mappings/ contains the destination mapping and source mappings for the nine NCDC cohorts In the subsequent step, the destination and source mappings are used to transform the cohort data and to set up and populate a database according to the common data model. The folder cdm_parser/ contains scripts that create and populate a PostgreSQL database based on the OMOP CDM. The scripts accept file-based datasets in CSV, SPSS, or SAS (Statistical Analysis Software) formats as input The folder scripts/ contains scripts that generate summary statistics to assess the correctness and completeness of the data transformation The subfolder examples/data-retrieval/ contains examples illustrating how to extract data from and query an OMOP CDM database. These examples can be used as an educational resource or as a starting point for researchers working with OMOP-formatted data. Further usage instructions can be found in the included README file, and software requirements are listed in the requirements file. Associated materials This repository was developed for the publication: Mateus P, Moonen J, Beran M, Jaarsma E, van der Landen SM, Heuvelink J, Birhanu M, Harms AGJ, Bron E, Wolters FJ, Cats D, Mei H, Oomens J, Jansen W, Schram MT, Dekker A, Bermejo I. Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: A Netherlands consortium of dementia cohorts case study. J Biomed Inform. 2024 Jul;155:104661. doi: 10.1016/j.jbi.2024.104661. Epub 2024 May 26. PMID: 38806105. More information about the harmonisation process and the reasoning that guided this process can be found in the publication. Please cite the publication when using this work. For the latest updates, issue tracking, and development history, please visit the GitHub repository: https://github.com/MaastrichtU-CDS/omop-converter.

Keywords

Medical and health sciences, FOS: Computer and information sciences, Cohort Studies, Data harmonisation, FAIR data, Computer and information sciences, OMOP Common Data Model, Common Data Model, Data interoperability, Dementia, FOS: Medical and health sciences

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average