Computational prediction of functional similarity of CRMs

Doctoral thesis English OPEN
Koohy, Hashem
  • Subject: QH426

Transcriptional regulation of genes is fundamental to all living organisms. The spatial, temporal and condition-specific expression levels of genes are in part determined by inherited regulatory codes in non-coding regions of the DNA. A large set of methods have been proposed to detect conserved regions of regulatory DNA by means of sequence alignments. However, it has become clear that some regulatory regions do not show statistically significant alignments even in the presence of functional conservation. Therefore, detecting and characterising elusive regulatory codes remains a challenging problem. \ud In this thesis we develop and validate a novel computational alignment free model for detection of functional similarity of regulatory sequences. We show that our model can detect functional links between pairs of sequences that do not align with a significant score. We apply the model to a) detect enhancers within the same genome that are likely to have similar functions and b) to detect functionally conserved enhancer regions in orthologous genomes. Our method finds regulatory codes that are common to groups of similar enhancers and consistent with previous biological knowledge. \ud The inputs for our model are two sequences that we wish to compare in terms of their functional similarity as well as a set of transcription factor motifs. The mathematical framework of our model is built on two main components: In the first model component, each sequence is mapped to a vector of estimated occupancy levels for all motifs. These vectors are representing which motifs at what multiplicity and specificity are present in each sequence.\ud In the second model component, a statistical approach is established where we first estimate a probability distribution of motif occupancy levels for sequences that function similar to the template sequence. We then compute a statistical similarity score to evaluate if the sequences are more similar to each other than to random background sequences.\ud Two applications of this model are presented: First it is applied to a set of experimentally validated non-alignable enhancers from\ud D. melanogaster. We show that:\ud • Our model can detect statistical links between these enhancers,\ud • Weak binding sites can make a strong contribution to sequence similarity,\ud • Our model treats statistically significant presence and absence of motifs symmetrically. Similarity of sequences, therefore, can be based on a combination of the two. We show examples of motifs making contributions to sequence similarity through their absence.\ud • Using our model, we can create a network of similarities among the fly enhancers. Groups of enhancers in this network show common\ud regulatory codes. One of these regulatory codes is strongly supported by existing experimental data.\ud In the second application of our model we predict functional subregions of a known D. melanogaster enhancer. To achieve this, we first show that the model can detect the orthology of this enhancer between 10 Drosophila species. We then demonstrate how this statistical link can be used to predict functional subregions within this enhancer.
  • References (32)
    32 references, page 1 of 4

    [1] S. Aerts, P. Van Loo, G. Thijs, Y. Moreau, and B. De Moor. Computational detection of cis-regulatory modules. Bioinjormatics, 19 SuppI2:ii5-14, 2003. 6, 26

    [2] A. Aggarwal, M.M. Klawe, M. Shlomo, P. Shor, and R. Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica, 2: 195-208, 1987. 115

    [3] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403-10, 1990. 6

    [4] A. N. Arslan, O. Egecioglu, and P. A. Pevzner. A new approach to sequence comparison: normalized sequence alignment. Bioinjormatics, 17(4):327-37, 2001. 111

    [5] E. Berezikov, V. Guryev, and E. Cuppen. Conreal web server: identification and visualization of conserved transcription factor binding sites. Nucleic Acids Res, 33(Web Server issue):W447-50, 2005. 109

    [6] L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, T. Kuhlman, and R. Phillips. Transcriptional regulation by the numbers: applications. CUrT' Opin Genet Dev, 15(2):125-35, 2005. 12

    [7] L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, and R. Phillips. Transcriptional regulation by the numbers: models. CUrT' Opin Genet Dev, 15(2):116-24, 2005. 11, 12, 13, 15

    [8] B. E. Blaisdell. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA, 83(14):5155-9, 1986. 6, 26 [18] B. C. Foat, A. V. Morozov, and H. J. Bussemaker. Statistical mechanical modeling of genome-wide transcription factor occupancy data by matrixreduce. Bioinjormatics, 22(14):eI41-9, 2006.

    [20] J. Gertz, E. D. Siggia, and B. A. Cohen. Analysis of combinatorial cisregulation in synthetic and genomic promoters. Nature, 457(7226):215-8, 2009. 2, 6, 12, 13

    [21] R. Gordan, L. Narlikar, and A. J. Hartemink. Finding regulatory DNA motifs using alignment-free evolutionary conservation information. Nucleic Acids Res, 38(6):e90, 2010. 8

  • Related Research Results (1)
  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Warwick Research Archives Portal Repository - IRUS-UK 0 8
Share - Bookmark