Co-Clustering via Information-Theoretic Markov Aggregation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Apr 2019Embargo end date: 01 Jan 2018Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Knowledge and Data Engineering, volume 31, pages 720-732 (issn: 1041-4347, eissn: 2326-3865,

Copyright policy )Funded by:FWF | Information-Theoretic Mar...

Authors: Clemens Blöchl; Rana Ali Amjad; Bernhard C. Geiger;

doi: 10.1109/tkde.2018.2846252 , 10.48550/arxiv.1801.00584

arXiv: 1801.00584

Co-Clustering via Information-Theoretic Markov Aggregation

- Summary
- Subjects
- Metrics

Abstract

We present an information-theoretic cost function for co-clustering, i.e., for simultaneous clustering of two sets based on similarities between their elements. By constructing a simple random walk on the corresponding bipartite graph, our cost function is derived from a recently proposed generalized framework for information-theoretic Markov chain aggregation. The goal of our cost function is to minimize relevant information loss, hence it connects to the information bottleneck formalism. Moreover, via the connection to Markov aggregation, our cost function is not ad hoc, but inherits its justification from the operational qualities associated with the corresponding Markov aggregation problem. We furthermore show that, for appropriate parameter settings, our cost function is identical to well-known approaches from the literature, such as Information-Theoretic Co-Clustering of Dhillon et al. Hence, understanding the influence of this parameter admits a deeper understanding of the relationship between previously proposed information-theoretic cost functions. We highlight some strengths and weaknesses of the cost function for different parameters. We also illustrate the performance of our cost function, optimized with a simple sequential heuristic, on several synthetic and real-world data sets, including the Newsgroup20 and the MovieLens100k data sets.

Related Organizations

TECHNISCHE UNIVERSITAET MUENCHEN
Germany
Technical University of Munich
Germany
Know Center
Austria
Technische Universität München
Brazil
Graz University of Technology
Austria

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Information Theory, Information Theory (cs.IT), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	10
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%