Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Software . 2023
License: CC BY
Data sources: Datacite
ZENODO
Software . 2023
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Snekmer: A scalable pipeline for protein sequence fingerprinting using amino acid recoding (AAR)

Authors: McDermott, Jason E.; Chang, Christine H.; Jerger, Abby; Nelson, William B.; Jacobson, Jeremy R.;

Snekmer: A scalable pipeline for protein sequence fingerprinting using amino acid recoding (AAR)

Abstract

Snekmer is a software package designed to reduce the representation of protein sequences by combining amino acid reduction (AAR) with the kmer approach. Based on the AAR-kmer representations, Snekmer subsequently (1) clusters sequences using various unsupervised clustering algorithms, (2) generates supervised machine learning models, or (3) searches sequences against pre-trained models to determine probabilistic annotations. There are three operation modes for Snekmer: cluster, model, and search. Cluster Mode: The user supplies files containing sequences in an appropriate format (e.g. FASTA). Snekmer applies the relevant workflow steps and outputs the resulting clustering results in tabular form (.CSV), as well as the cluster object itself (.cluster). Figures are also generated (e.g. t-SNE, UMAP) to help the user contextualize their results. Model mode: The user supplies files containing sequences in an appropriate format (e.g. FASTA). Snekmer applies the relevant workflow steps and outputs the resulting models as objects (.model). Snekmer also displays K-fold cross validation results in the form of figures (AUC ROC and PR AUC curves) and a table (.CSV). Search mode: The user supplies files containing sequences in an appropriate format (e.g. FASTA) and the models they wish to search their sequences against. Snekmer applies the relevant workflow steps and outputs a table for each file containing model annotation probabilities for the given sequences. Federal Acknowledgements This research was supported in part by the U.S. Department of Energy (DOE), Office of Biological and Environmental Research (BER), as part of the Genomic Science Program (GSP) as a contribution of the Pacific Northwest National Laboratory (PNNL) Secure Biosystems Design Science Focus Area: Persistence Control of Engineered Functions in Complex Soil Microbiomes (PerCon SFA). Pacific Northwest National Laboratory (PNNL) is a multiprogram national laboratory managed by the Battelle Memorial Institute Battelle Memorial Institute, operating under the U.S. Department of Energy, Contract DE-AC05-76RL01830. 

Related Organizations
Keywords

PerCon SFA, Amino Acid Recoding (AAR), Protein Sequence Analysis Tools, Short Peptide Sequences (Kmers)

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 150
    download downloads 1
  • 150
    views
    1
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
150
1
Related to Research communities