
pmid: 32096630
Identification of unknowns is a bottleneck for large-scale untargeted analyses like metabolomics or drug metabolite identification. Ion mobility-mass spectrometry (IM-MS) provides rapid two-dimensional separation of ions based on their mobility through a neutral buffer gas. The mobility of an ion is related to its collision cross section (CCS) with the buffer gas, a physical property that is determined by the size and shape of the ion. This structural dependency makes CCS a promising characteristic for compound identification, but this utility is limited by the availability of high-quality reference CCS values. CCS prediction using machine learning (ML) has recently shown promise in the field, but accurate and broadly applicable models are still lacking. Here we present a novel ML approach that employs a comprehensive collection of CCS values covering a wide range of chemical space. Using this diverse database, we identified the structural characteristics, represented by molecular quantum numbers (MQNs), that contribute to variance in CCS and assessed the performance of a variety of ML algorithms in predicting CCS. We found that by breaking down the chemical structural diversity using unsupervised clustering based on the MQNs, specific and accurate prediction models for each cluster can be trained, which showed superior performance than a single model trained with all data. Using this approach, we have robustly trained and characterized a CCS prediction model with high accuracy on diverse chemical structures. An all-in-one web interface (https://CCSbase.net) was built for querying the CCS database and accessing the predictive model to support unknown compound identifications.
Ions, Machine Learning, Databases, Factual, Surface Properties, Ion Mobility Spectrometry, Particle Size, Algorithms
Ions, Machine Learning, Databases, Factual, Surface Properties, Ion Mobility Spectrometry, Particle Size, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 121 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
