AMP Project Data

# AMP Dataset Documentation This repository contains several datasets used for conducting ProteoGPT, AMPSorter, AMPGenix and BioToxiPept and datasets conducted with prediction results. Below is a description of each dataset included in this repository. ## Datasets ### 1. `uniprot-compressed_true_download_true_format_fasta_includeIsoform_tr-2022.10.13-02.51.31.70.fasta`- **Description**: 609,216 non-redundant canonical and isoform protein sequences. ### 2. `protein_seqs_1000.json`- **Description**: Contains the training data for ProteoGPT. ### 3. `amp_unique_16062.json`- **Description**: Contains the training data for AMPGenix. ### 4. `AMPSorter&BioToxiPept dataset.xlsx`- **Description**: Contains the fine-tuning data, including: - **AMP_data split**: Data used for training, validating and evaluating AMPSorter. - **AMP_test Set**: Data used for test AMPSorter. - **AMP_benchmarking Set**: A set of peptides used for benchmarking the AMP models. - **AMP_external Validation Dataset**: A separate dataset for external model validation for AMPSorter. - **Toxin_data split**: Data used for training, validating and evaluating BioToxiPept. - **Toxin_test set**: A set of peptides used for testing the toxin models. ### 5. `NRSPDs`- **Description**: A large dataset that includes: - **410,192,277 non-redundant short peptides**. - **A candidate pool of 82,694,928 peptides**. - **Logits** with results predicted by AMPsorter and BioToxiPept. ### 6. `GNRSPDs`- **Description**: Contains: - **7,798 generated sequences**. - **A candidate pool of 4,736 peptides**. - **Logits**with results predicted by AMPsorter and BioToxiPept. ### 7. `196 tested peptides.xlsx`- **Description**: A set of 196 selected and experimentally tested peptides, with experimentally measured values. ### 8. `20 pilot tested peptides.xlsx`- **Description**: 20 selected peptides with prediction results and experimentally values measured in pilot test. ### 9. `Sequences generated by different models.xlsx`- **Description**: Sequences generated by different models with prediction results.

Related Organizations

Shandong Women’s University
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average