Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Universidade do Minh...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Neurocomputing
Article . 2022 . Peer-reviewed
License: Elsevier TDM
Data sources: Crossref
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

ProPythia: A Python package for protein classification based on machine and deep learning

Authors: Sequeira, Ana; Lousa, Diana; Rocha, Miguel;

ProPythia: A Python package for protein classification based on machine and deep learning

Abstract

The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application. Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions. ProPythia has an adaptable modular architecture being a versatile and easyto use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia.

Related Organizations
Keywords

Enzyme, Protein, Machine learning, peptide classification, Deep learning, Python package, Antimicrobial peptide, Protein/ peptide classification

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    22
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 36
    download downloads 15
  • 36
    views
    15
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
22
Top 10%
Top 10%
Top 10%
36
15
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!