Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

AMP Project Data

Authors: Wang, Yihui;

AMP Project Data

Abstract

# AMP Dataset Documentation This repository contains several datasets used for conducting ProteoGPT, AMPSorter, AMPGenix and BioToxiPept and datasets conducted with prediction results. Below is a description of each dataset included in this repository. ## Datasets ### 1. `uniprot-compressed_true_download_true_format_fasta_includeIsoform_tr-2022.10.13-02.51.31.70.fasta`- **Description**: 609,216 non-redundant canonical and isoform protein sequences. ### 2. `protein_seqs_1000.json`- **Description**: Contains the training data for ProteoGPT. ### 3. `amp_unique_16062.json`- **Description**: Contains the training data for AMPGenix. ### 4. `AMPSorter&BioToxiPept dataset.xlsx`- **Description**: Contains the fine-tuning data, including: - **AMP_data split**: Data used for training, validating and evaluating AMPSorter. - **AMP_test Set**: Data used for test AMPSorter. - **AMP_benchmarking Set**: A set of peptides used for benchmarking the AMP models. - **AMP_external Validation Dataset**: A separate dataset for external model validation for AMPSorter. - **Toxin_data split**: Data used for training, validating and evaluating BioToxiPept. - **Toxin_test set**: A set of peptides used for testing the toxin models. ### 5. `NRSPDs`- **Description**: A large dataset that includes: - **410,192,277 non-redundant short peptides**. - **A candidate pool of 82,694,928 peptides**. - **Logits** with results predicted by AMPsorter and BioToxiPept. ### 6. `GNRSPDs`- **Description**: Contains: - **7,798 generated sequences**. - **A candidate pool of 4,736 peptides**. - **Logits**with results predicted by AMPsorter and BioToxiPept. ### 7. `196 tested peptides.xlsx`- **Description**: A set of 196 selected and experimentally tested peptides, with experimentally measured values. ### 8. `20 pilot tested peptides.xlsx`- **Description**: 20 selected peptides with prediction results and experimentally values measured in pilot test. ### 9. `Sequences generated by different models.xlsx`- **Description**: Sequences generated by different models with prediction results.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average