Multi-scale variational autoencoder for imputation of missing values in untargeted metabolomics using whole-genome sequencing data

Name: Multi-scale variational autoencoder for imputation of missing values in untargeted metabolomics using whole-genome sequencing data
Keywords: Whole Genome Sequencing, Humans, Metabolomics, Polymorphism, Single Nucleotide, Article, Linkage Disequilibrium

Chen Zhao; Kuan-Jui Su; Chong Wu; Xuewei Cao; Qiuying Sha; Wu Li; Zhe Luo; Tian Qing; Chuan Qiu; Lan Juan Zhao; Anqi Liu; Lindong Jiang; Xiao Zhang; Hui Shen; Weihua Zhou; Hong-Wen Deng

Found an issue? Give us feedback

PubMed Centralarrow_drop_down

PubMed Central

Other literature type . 2024

Data sources: PubMed Central

Computers in Biology and Medicine

Article . 2024 . Peer-reviewed

License: Elsevier TDM

Data sources: Crossref

https://pubmed.ncbi.nlm.nih.go...

Article . 2024

Data sources: Europe PubMed Central

Multi-scale variational autoencoder for imputation of missing values in untargeted metabolomics using whole-genome sequencing data

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Sep 2024 English Publisher:Elsevier BVJournal:Computers in Biology and Medicine, volume 179, page 108,813 (issn: 0010-4825,

Copyright policy )

Authors: Chen Zhao; Kuan-Jui Su; Chong Wu; Xuewei Cao; Qiuying Sha; Wu Li; Zhe Luo; +9 Authors

doi: 10.1016/j.compbiomed.2024.108813

pmid: 38955127

pmc: PMC11324385

Multi-scale variational autoencoder for imputation of missing values in untargeted metabolomics using whole-genome sequencing data

- Summary
- Subjects
- Metrics

Abstract

Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies.In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-scale variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information.We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R2-scores > 0.01 for 71.55 % of metabolites.The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.

Related Organizations

Michigan Technological University
United States
Kennesaw State University
United States
The University of Texas MD Anderson Cancer Center
United States
Tulane University
United States

Keywords

Whole Genome Sequencing, Humans, Metabolomics, Polymorphism, Single Nucleotide, Article, Linkage Disequilibrium

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

5

Top 10%

Average

Top 10%

Green