Downloads provided by UsageCounts
Introduction Deep learning has revolutionized the analysis of mass spectra; from predicting the tandem mass spectrum generated by an analyte, to sequencing peptides from mass spectra de novo, the neural network models that underpin deep learning are now ubiquitous. In recent years, a neural network architecture called the transformer has become the architecture of choice for developing state-of-the-art deep learning models, in domains including natural language processing, protein structure prediction, and importantly, the analysis of mass spectra. However, every new model developed for mass spectra has essentially been forced to start from scratch. Here, we introduce depthcharge, an open-source deep learning framework that provides the building blocks for transformer models of mass spectra and the analytes that generate them. Methods A tandem mass spectrum can be described as a bag of peaks where each peak is defined as a pair of m/z and intensity values. The distances between m/z values, the m/z values themselves, and their associated intensities provide structural information about the analyte; hence, we hypothesize that the self-attention mechanism which characterizes the transformer architecture would be ideal for learning the relationships among peaks within a mass spectrum, similar to the relationships among words within a sentence. Additionally, peptides and small molecules can be represented as sequences of tokens (either a peptide sequence or SMILES string). Depthcharge provides PyTorch modules to parse, batch, and encode these data structures and use them to build transformer models. Results Depthcharge provides the building blocks to build transformer models for mass spectra and common analytes, such as peptides and small molecules. Unlike other previous architectures, such as recurrent neural networks, transformers lack a built-in representation for the order of elements in the input sequence; position in the sequences is generally encoded as a sequence of sinusoids that is summed with a representation of each element. We use this quality of transformers to our advantage to model mass spectra: the m/z values are encoded as a series of sinusoids and summed with a learned representation of the intensity. We illustrate both how this process takes place and demonstrate that this method provides a high fidelity representation of a mass spectrum. We then present a series of case studies on the various ways that depthcharge can be used, demonstrating the configurations required for, predicting peptide properties such as collisional cross section, predicting the b and y ion intensities generated from a peptide precursor, and co-embedding peptides and mass spectral into the same latent space. In each case, we build a minimal model atop depthcharge and outline the components required to build it. We then compare each against current tools in the field, demonstrating that even these minimal models are capable of achieving high-quality results. Finally, we show that these models require relatively few lines of code to implement due to the tools provided by depthcharge. We aim for depthcharge to provide a user-friendly, foundational framework that will propel biological discovery through new models of mass spectrometry data. Depthcharge is open-source and available under the permissive Apache 2.0 license: https://github.com/wfondrie/depthcharge
Proteomics, FOS: Computer and information sciences, AI, Bioinformatics, Deep learning, Mass Spectrometry
Proteomics, FOS: Computer and information sciences, AI, Bioinformatics, Deep learning, Mass Spectrometry
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 311 | |
| downloads | 113 |

Views provided by UsageCounts
Downloads provided by UsageCounts