
Recent advances in protein structure determination and prediction offer new opportunities to decipher relationships amongst proteins--a task that entails 3D structure comparison and classification. Historically, protein domain classification has been somewhat manual and heuristic. While CATH and related resources represent significant steps towards a more systematic (and automatable) approach, more scalable and objective classification methods, e.g., grounded in machine learning, could be informative. Indeed, comparative analyses of protein structures via Deep Learning (DL), though it may entail large-scale restructuring of classification schemes, could uncover distant relationships. We have developed new DL models for domain structures (including physicochemical properties), focused initially at CATH's homologous superfamily (SF) level. Adopting DL approaches to image classification and segmentation, we have devised and applied a hybrid convolutional autoencoder architecture that allows SF-specific models to learn features that, in a sense, 'define' the various homologous SFs. We quantitatively evaluate pairwise 'distances' between SFs by building one model per SF and comparing the loss functions of the models. Clustering on these distance matrices provides a new view of protein interrelationships--a view that extends beyond simple structural/geometric similarity, towards the realm of structure/function properties, and that is consistent with a recently proposed 'Urfold' concept.
Protein Structure, Deep Learning, Sequence, Structure, Function
Protein Structure, Deep Learning, Sequence, Structure, Function
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
