TemStaPro: protein thermostability prediction using sequence representations from protein language models

Name: TemStaPro: protein thermostability prediction using sequence representations from protein language models
Keywords: Machine Learning, Original Paper, Proteins, Amino Acid Sequence, Software, Language

Ieva Pudžiuvelytė; Kliment Olechnovič; Egle Godliauskaite; Kristupas Sermokas; Tomas Urbaitis; Giedrius Gasiunas; Darius Kazlauskas

Found an issue? Give us feedback

downloadFull-Text

Vilnius University I...arrow_drop_down

Vilnius University Institutional Repository

Article . 2024

License: CC BY

Full-Text: https://epublications.vu.lt/object/elaba:200367943/200367943.pdf

Data sources: Vilnius University Institutional Repository

Bioinformatics

Article . 2024 . Peer-reviewed

License: CC BY

Data sources: Crossref

Bioinformatics

Article . 2024

Data sources: Europe PubMed Central

PubMed Central

Other literature type . 2024

License: http://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Data sources: PubMed Central

https://doi.org/10.1101/2023.0...

Article . 2023 . Peer-reviewed

Data sources: Crossref

TemStaPro: protein thermostability prediction using sequence representations from protein language models

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 28 Mar 2023 Lithuania Publisher:Cold Spring Harbor LaboratoryJournal:Bioinformatics, volume 40 (eissn: 1367-4811,

Copyright policy )

Authors: Ieva Pudžiuvelytė; Kliment Olechnovič; Egle Godliauskaite; Kristupas Sermokas; Tomas Urbaitis; Giedrius Gasiunas; Darius Kazlauskas;

doi: 10.1101/2023.03.27.534365 , 10.1093/bioinformatics/btae157

pmid: 38507682

pmc: PMC11001493

TemStaPro: protein thermostability prediction using sequence representations from protein language models

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

AbstractMotivationReliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. These methods can facilitate training on more data and, possibly, enable development of more versatile thermostability predictors for multiple ranges of temperatures.ResultsWe applied the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over one million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data.Availability and ImplementationTemStaPro software and the related data are freely available fromhttps://github.com/ievapudz/TemStaProandhttps://doi.org/10.5281/zenodo.7743637.

Country

Lithuania

Related Organizations

Vilnius University
Lithuania
INSTITUTE OF BIOTECHNOLOGY
Polish Academy of Sciences
Poland
Institute of Biotechnology
Czech Republic
Institute of Computer Science
Poland

View all View all

Keywords

Machine Learning, Original Paper, Proteins, Amino Acid Sequence, Software, Language

2 Research products, page 1 of 1

TemStaPro Datasets
2023IsSupplementedBy
TemStaPro software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	41
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%