
AbstractThe amyloid conformation can be adopted by a variety of sequences, but the precise boundaries of amyloid sequence space are still unclear. The currently charted amyloid sequence space is strongly biased towards hydrophobic, beta-sheet prone sequences that form the core of globular proteins and by Q/N/Y rich yeast prions. Here, we took advantage of the increasing amount of high-resolution structural information on amyloid cores currently available in the protein databank to implement a machine learning approach, named Cordax (https://cordax.switchlab.org), that explores amyloid sequence beyond its current boundaries. Clustering by t-Distributed Stochastic Neighbour Embedding (t-SNE) shows how our approach resulted in an expansion away from hydrophobic amyloid sequences towards clusters of lower aliphatic content and higher charge, or regions of helical and disordered propensities. These clusters uncouple amyloid propensity from solubility representing sequence flavours compatible with surface-exposed patches in globular proteins, functional amyloids or sequences associated to liquid-liquid phase transitions.
Amyloid, PREDICTION, Protein Conformation, Science, Amyloidogenic Proteins, DETERMINANTS, Protein Engineering, Article, AGGREGATION-PRONE REGIONS, Machine Learning, FIBRILS, Humans, SERVER, Science & Technology, Q, PEPTIDES, Amyloidosis, POLYMORPHISM, SEGMENTS, Multidisciplinary Sciences, Models, Chemical, Solubility, Science & Technology - Other Topics, PROTEIN AGGREGATION, INHIBITORS, Peptides, Hydrophobic and Hydrophilic Interactions, Algorithms
Amyloid, PREDICTION, Protein Conformation, Science, Amyloidogenic Proteins, DETERMINANTS, Protein Engineering, Article, AGGREGATION-PRONE REGIONS, Machine Learning, FIBRILS, Humans, SERVER, Science & Technology, Q, PEPTIDES, Amyloidosis, POLYMORPHISM, SEGMENTS, Multidisciplinary Sciences, Models, Chemical, Solubility, Science & Technology - Other Topics, PROTEIN AGGREGATION, INHIBITORS, Peptides, Hydrophobic and Hydrophilic Interactions, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 74 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
