Learning from Multi-View Data: Clustering Algorithm and Text Mining Application (Leren van multi-view gegevens: clustering algoritme en text mining toepassing)

Name: Learning from Multi-View Data: Clustering Algorithm and Text Mining Application (Leren van multi-view gegevens: clustering algoritme en text mining toepassing)
Creator: Liu, Xinhai
Keywords: SISTA, STADIUS-11-163

Learning from Multi-View Data: Clustering Algorithm and Text Mining Application

descriptionPublicationkeyboard_double_arrow_right Doctoral thesis 15 Sep 2011 Dutch; Flemish

Authors: Liu, Xinhai;

Learning from Multi-View Data: Clustering Algorithm and Text Mining Application (Leren van multi-view gegevens: clustering algoritme en text mining toepassing)

- Summary
- Subjects
- Metrics

Abstract

The dissertation is organized into three parts.In the first part, we analyze multi-view clustering from multilinearperspective and create several novel multi-view clusteringalgorithms. At first, modeling multi-view data as a tensor, wepresent a novel tensor based multi-view partitioning framework forintegrating multi-view data in the context of spectral clustering.Within this framework, a joint optimal subspace shared by multi-viewdata as well as the multilinear relationships among multi-view dataare revealed by the relevant tensor methods. Second, takingmulti-view data as multiple graphs, we put forward a multi-viewclustering strategy based on simultaneous trace maximization (STM),which analyzes multi-view data through a multilinear perspective aswell. Third, a joint dimension reduction scheme based on tensordecomposition is presented, particularly for multi-view data. Thedimension reduction scheme is embedded into the STM based multi-viewclustering strategy, which enables us to handle large-scalemulti-view data. In the second part, we investigate text mining to extract multi-viewheterogeneous data from a large-scale publication database of Web ofScience (WoS). In order to facilitate the scientific mapping that isuseful for monitoring and detecting new trends in differentscientific fields, hybrid clustering, either in vector spaces or ingraph spaces, is carried out to integrate these multi-view data.Regarding hybrid clustering in vector spaces, various methodologiesare included in a unified framework, which consists of two generalapproaches: clustering ensemble and kernel fusion. A mutualinformation based weighting scheme is proposed to leverage the effectof multiple data sources in hybrid clustering. Concerning hybridclustering in graph spaces, various graphs are generated frommulti-view data. Utilizing the complementary properties of both textgraph and citation graph, we present a hybrid strategy named graphcoupling. Meanwhile, based on the modularity optimization, our graphcoupling strategy detects the number of clusters automatically andprovides a top-down hierarchical analysis, which fits in with thepractical applications. In addition, the computation of thismodularity based hybrid clustering method is so efficient that itdoes well in partitioning large-scale data. In the third part, we propose a novel strategy to derive knowledge fromtextual information from a multi-view perspective. The multiple viewscan be different controlled vocabularies, term weighting schemes,publishing time periods and biomedical subjects. Our strategy hasbeen applied to the MEDLINE corpus and analyzed using a disease baseddata set. In particular, we investigate the effect of combiningmultiple views for clustering and assessed whether vertical searchescan be more accurate for specific biological questions. Moreover, aWeb application of our multi-view text mining strategy is developedfor gene retrieval.

Related Organizations

KU Leuven
Belgium

Keywords

SISTA, STADIUS-11-163

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green