Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . null
Data sources: ZENODO
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Code and Datasets for MassNet: Database Search Workflows, Retention Time Prediction, and PSM Rescoring

Authors: A, Jun;

Code and Datasets for MassNet: Database Search Workflows, Retention Time Prediction, and PSM Rescoring

Abstract

This repository compiles the core resources used to construct the MassNet dataset, including: 1) FASTA sequence files for each species, used for database searching; 2) Standardized database search workflows based on FragPipe and Sage engines for unified processing of raw DDA-MS data and high-confidence peptide identification. Additionally, the repository provides the following data resources and supporting tools for downstream AI tasks: 1) Retention time (RT) prediction task: training and validation datasets constructed from FragPipe and Sage results, along with corresponding RT prediction model outputs; 2) Peptide-spectrum match (PSM) rescoring task: PSM datasets for training and evaluation results; Dataset construction tools: complete code and documentation for generating the above task-specific datasets. For detailed model training procedures and usage instructions, please refer to the following official repositories:DeepLC: https://github.com/CompOmics/DeepLCDDA-BERT: https://github.com/guomics-lab/DDA-BERT All resources provided in this repository enable full reproduction of the core experimental and analytical results reported in the manuscript "MassNet: billion-scale AI-ready mass spectrometry corpus enabling scalable deep Learning in proteomics".

Powered by OpenAIRE graph
Found an issue? Give us feedback