Bayesian clustering of curves and the search of the partition space

Doctoral thesis English OPEN
Liverani, Silvia
  • Subject: QA | QH426

This thesis is concerned with the study of a Bayesian clustering algorithm, proposed by Heard et al. (2006), used successfully for microarray experiments over time. It focuses not only on the development of new ways of setting hyperparameters so that inferences both reflect the scientific needs and contribute to the inferential stability of the search, but also on the design of new fast algorithms for the search over the partition space. First we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Moreover, we demonstrate that MAP (maximum a posteriori) search is satisfied when a utility function is defined according to the scientific interest of the clusters. We then focus on the search over the partition space. In model-based clustering a comprehensive search for the highest scoring partition is usually impossible, due to the huge number of partitions of even a moderately sized dataset. We propose two methods for the partition search. One method encodes the clustering as a weighted MAX-SAT problem, while the other views clusterings as elements of the lattice of partitions. Finally, this thesis includes the full analysis of two microarray experiments for identifying circadian genes.
  • References (26)
    26 references, page 1 of 3

    Chapter 5 Utility-based Clustering 90 5.1 A Clustering for Time-course Data . . . . . . . . . . . . . . . . . . . . 92 5.2 Utility over Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.1 A Useful Class of Utilities . . . . . . . . . . . . . . . . . . . . . 94 5.2.2 Marginal Search . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3 Properties of the Product Utility . . . . . . . . . . . . . . . . . . . . . 100 5.3.1 Product Utilities and Local Moves . . . . . . . . . . . . . . . . 100 5.3.2 Relationships between Product Utility and MAP . . . . . . . . . 103 5.3.3 Robustness of the Utility Weighted Score . . . . . . . . . . . . 107 5.3.4 Some Practical Issues . . . . . . . . . . . . . . . . . . . . . . . 107 5.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.1 Clusters obtained on 18 genes with direct AHC . . . . . . . . . . . . . 111 5.2 Clusters obtained on 18 genes with AHC on interesting clusters . . . . . 112 5.3 Reclassification of a known gene from a potentially not interesting cluster to a potentially circadian cluster . . . . . . . . . . . . . . . . . . . . . 114

    7.1 The lattice of partitions . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Amaratunga, D. and Cabrera, J. (2003). Exploration and analysis of DNA microarray and protein array data. Wiley-IEEE.

    Anderson, P. E., Smith, J. Q., Edwards, K. D., and Millar, A. J. (2006). Guided Conjugate Bayesian Clustering for Uncovering Rhythmically expressed Genes. Technical Report 07, CRiSM Working Paper, University of Warwick, UK.

    Angelini, C., De Canditiis, D., Mutarelli, M., and Pensky, M. (2007). A Bayesian Approach to Estimation and Testing in Time-course Microarray Experiments. Statistical Applications in Genetics and Molecular Biology, 6(1):1299.

    Banfield, J. D. and Raftery, A. E. (1993). Model-Based Gaussian and Non-Gaussian Clustering. Biometrics, 49(3):803-821.

    Bar-Joseph, Z., Gifford, D., Jaakkola, T., and Simon, I. (2002). A new approach to analyzing gene expression time series data. Proceedings of the 6th Annual International Conference on Computational Biology, pages 39-48.

    Barry, D. and Hartigan, J. A. (1992). Product Partition Models for Change Point Problems. Annals of Statistics, 20(1):260-279.

    Ben-Dor, A., Shamir, R., and Yakhini, Z. (1999). Clustering Gene Expression Patterns. Journal of Computational Biology, 6(3-4):281-297.

    Ben-Hur, A., Elisseeff, A., and Guyon, I. (2002). A stability based method for discovering structure in clustered data. In Pacific Symposium on Biocomputing 2002: Kauai, Hawaii, 3-7 January 2002, page 6. World Scientific Publishing Company.

    Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. Chichester: Wiley.

  • Related Research Results (1)
  • Metrics
    0
    views in OpenAIRE
    0
    views in local repository
    14
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Warwick Research Archives Portal Repository - IRUS-UK 0 14
Share - Bookmark