Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Apolloarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 1 versions
addClaim

Phylogenetic Signals in Protein Data

Authors: Qin, Chongli;

Phylogenetic Signals in Protein Data

Abstract

Structural biology has seen major advances over the past decade. In the area of protein structure prediction we have seen significant increase in accuracy with the discovery of coevolutionary signals in a multiple sequence alignment (MSA). Unlike methods which fold proteins using molecular dynamic (MD) simulations, these coevolutionary methods make use of correlation information to fold large protein structures orders of magnitudes faster. Often the correlation signals in a MSA are a strong indicator that a pair of amino acids are sufficiently close together to be in contact, thus interacting with each other. It has been shown that accurate inference of amino acid pairs that are in contact in the protein gives rise to accurate prediction of protein structure itself. Hence, statistical inference of amino acid pairs in contact is an important problem for protein folding. However, one of the major challenges of these statistical inference methods is that levels of noise significantly overwhelm the relevant signal for protein data. In this thesis, we attempt to alleviate one of the most important sources of noise which is also one that is often ignored: spurious correlations induced by phylogeny. To this end, we introduce a novel method for disentangling phylogenetic noise from the relevant structural signals. This method is grounded in an extension to a well-known theorem in Random Matrix Theory. Through extensive analysis on both synthetic and protein data, we demonstrate that it is possible to disentangle these two sources of information. Crucially, we find that the phylogenetic correlations can be largely removed by finding principal modes of the empirical correlation matrix where its corresponding eigenvalue satisfies a power-law.

Country
United Kingdom
Related Organizations
Keywords

Power law, Random Matrix Theory, Proteins, Phylogeny

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green