
usiGrabber is a scalable framework for assembling large and diverse mass-spectrometry datasets ready to be used for machine learning use cases As a proof of concept, we used usiGrabber to construct a phosphorylation-specific training dataset of nearly 11 million spectra and used it to retrain a binary phosphorylation classifier. This dataset and the corresponding model weights are available in this record. The publication also includes the complete database, which contains spectrum information and metadata for over 800 million spectra present in the PRIDE database. Because of its size, it had to be split into multiple uploads. In order to reconstruct the entire database, you must download all related records. Once you have downloaded all records, extract the archives and refer to usiGrabber - db_export for instructions for reassembly. Related records: peptide_spectrum_matches table: https://zenodo.org/records/18890370 psm_peptide_evidence table: https://zenodo.org/records/18864164 Other, smaller tables: https://zenodo.org/records/18873214
