
Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naïve Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a ‘distribution’ over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with >95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method’s derivation from Bayesian analysis on the set of inter-atomic contacts makes it easy to understand and extend to more complex cases.
FOS: Computer and information sciences, Computer Science - Machine Learning, unsupervised classification, Science, Physics, QC1-999, Q, Bayesian clustering, FOS: Physical sciences, Biomolecules (q-bio.BM), Computational Physics (physics.comp-ph), Astrophysics, Article, Machine Learning (cs.LG), QB460-466, Quantitative Biology - Biomolecules, FOS: Biological sciences, Bernoulli mixture, Physics - Computational Physics
FOS: Computer and information sciences, Computer Science - Machine Learning, unsupervised classification, Science, Physics, QC1-999, Q, Bayesian clustering, FOS: Physical sciences, Biomolecules (q-bio.BM), Computational Physics (physics.comp-ph), Astrophysics, Article, Machine Learning (cs.LG), QB460-466, Quantitative Biology - Biomolecules, FOS: Biological sciences, Bernoulli mixture, Physics - Computational Physics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
