Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2023
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Dive deeper with Depthcharge: A Transformer Toolkit for Modeling Mass Spectrometry Data

Authors: Fondrie, William E; Bittremieux, Wout; Yilmaz, Melih; Noble, William S;

Dive deeper with Depthcharge: A Transformer Toolkit for Modeling Mass Spectrometry Data

Abstract

Introduction Deep learning has revolutionized the analysis of mass spectra; from predicting the tandem mass spectrum generated by an analyte, to sequencing peptides from mass spectra de novo, the neural network models that underpin deep learning are now ubiquitous. In recent years, a neural network architecture called the transformer has become the architecture of choice for developing state-of-the-art deep learning models, in domains including natural language processing, protein structure prediction, and importantly, the analysis of mass spectra. However, every new model developed for mass spectra has essentially been forced to start from scratch. Here, we introduce depthcharge, an open-source deep learning framework that provides the building blocks for transformer models of mass spectra and the analytes that generate them. Methods A tandem mass spectrum can be described as a bag of peaks where each peak is defined as a pair of m/z and intensity values. The distances between m/z values, the m/z values themselves, and their associated intensities provide structural information about the analyte; hence, we hypothesize that the self-attention mechanism which characterizes the transformer architecture would be ideal for learning the relationships among peaks within a mass spectrum, similar to the relationships among words within a sentence. Additionally, peptides and small molecules can be represented as sequences of tokens (either a peptide sequence or SMILES string). Depthcharge provides PyTorch modules to parse, batch, and encode these data structures and use them to build transformer models. Results Depthcharge provides the building blocks to build transformer models for mass spectra and common analytes, such as peptides and small molecules. Unlike other previous architectures, such as recurrent neural networks, transformers lack a built-in representation for the order of elements in the input sequence; position in the sequences is generally encoded as a sequence of sinusoids that is summed with a representation of each element. We use this quality of transformers to our advantage to model mass spectra: the m/z values are encoded as a series of sinusoids and summed with a learned representation of the intensity. We illustrate both how this process takes place and demonstrate that this method provides a high fidelity representation of a mass spectrum. We then present a series of case studies on the various ways that depthcharge can be used, demonstrating the configurations required for, predicting peptide properties such as collisional cross section, predicting the b and y ion intensities generated from a peptide precursor, and co-embedding peptides and mass spectral into the same latent space. In each case, we build a minimal model atop depthcharge and outline the components required to build it. We then compare each against current tools in the field, demonstrating that even these minimal models are capable of achieving high-quality results. Finally, we show that these models require relatively few lines of code to implement due to the tools provided by depthcharge. We aim for depthcharge to provide a user-friendly, foundational framework that will propel biological discovery through new models of mass spectrometry data. Depthcharge is open-source and available under the permissive Apache 2.0 license: https://github.com/wfondrie/depthcharge

Related Organizations
Keywords

Proteomics, FOS: Computer and information sciences, AI, Bioinformatics, Deep learning, Mass Spectrometry

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 311
    download downloads 113
  • 311
    views
    113
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
311
113
Green