Efficient utility-based clustering over high dimensional partition spaces

Article, Other literature type English OPEN
Liverani, Silvia ; Anderson, Paul E. ; Edwards, Kieron D. ; Millar, A. J. ; Smith, J. Q. (2009)
  • Publisher: Int Soc Bayesian Analysis
  • Journal: (issn: 1936-0975)
  • Related identifiers: doi: 10.1214/09-BA420, doi: 10.1214/09-BA420
  • Subject: Circardian Expression Profiles | QA | Bayesian | Genetics | Posterior Probability Distribution
    acm: ComputingMethodologies_PATTERNRECOGNITION

Because of the huge number of partitions of even a moderately sized dataset, even when Bayes factors have a closed form, in model-based clustering a comprehensive search for the highest scoring (MAP) partition is usually impossible. However, when each cluster in a partition has a signature and it is known that some signatures are of scientific interest whilst others are not, it is possible, within a Bayesian framework, to develop search algorithms which are guided by these cluster signatures. Such algorithms can be expected to find better partitions more quickly. In this paper we develop a framework within which these ideas can be formalized. We then briefly illustrate the efficacy of the proposed guided search on a microarray time coursed at a set where the clustering objective is to identify clusters of genes with different types of circadian expression profiles.
  • References (23)
    23 references, page 1 of 3

    Anderson, P. E., Smith, J. Q., Edwards, K. D., and Millar, A. J. (2006). \Guided Conjugate Bayesian Clustering for Uncovering Rhytmically expressed Genes." CRISM Working Paper, (07). 556

    Ban¯eld, J. D. and Raftery, A. E. (1993). \Model-Based Gaussian and Non-Gaussian Clustering." Biometrics, 49(3): 803{821. 540

    Ben-Dor, A., Shamir, R., and Yakhini, Z. (1999). \Clustering Gene Expression Patterns." Journal of Computational Biology , 6(3{4): 281{297. 540

    Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. Chichester: Wiley. 540

    Booth, J. G., Casella, G., and Hobert, J. P. (2008). \Clustering using objective functions and stochastic search." Journal of the Royal Statistical Society, Series B, 70(1): 119{ 139. 556

    Chipman, H. A., George, E. I., and McCulloch, R. E. (2002). \Bayesian treed models." Machine Learning, 48(1{3): 299{320. 556

    Crowley, E. M. (1997). \Product Partition Models for Normal Means." Journal of the American Statistical Association, 92(437): 192{198. 556

    Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classi¯cation and Regression. Wiley Series in Probability and Statistics. John Wiley and Sons. 540, 541

    Edwards, K. D., Anderson, P. E., Hall, A., Salathia, N. S., Locke, J. C. W., Lynn, J. R., Straume, M., Smith, J. Q., and Millar, A. J. (2006). \FLOWERING LOCUS C Mediates Natural Variation in the High-Temperature Response of the Arabidopsis Circadian Clock." The Plant Cell, 18: 639{650. 541, 551, 552

    Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). \Cluster analysis and display of genome-wide expression patterns." Proceedings of the National Academy of Sciences, 95(25): 14863{14868. 553

  • Metrics
    No metrics available
Share - Bookmark