Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Liriasarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Lirias
Doctoral thesis . 2011
Data sources: Lirias
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Learning from Multi-View Data: Clustering Algorithm and Text Mining Application (Leren van multi-view gegevens: clustering algoritme en text mining toepassing)

Learning from Multi-View Data: Clustering Algorithm and Text Mining Application
Authors: Liu, Xinhai;

Learning from Multi-View Data: Clustering Algorithm and Text Mining Application (Leren van multi-view gegevens: clustering algoritme en text mining toepassing)

Abstract

The dissertation is organized into three parts.In the first part, we analyze multi-view clustering from multilinearperspective and create several novel multi-view clusteringalgorithms. At first, modeling multi-view data as a tensor, wepresent a novel tensor based multi-view partitioning framework forintegrating multi-view data in the context of spectral clustering.Within this framework, a joint optimal subspace shared by multi-viewdata as well as the multilinear relationships among multi-view dataare revealed by the relevant tensor methods. Second, takingmulti-view data as multiple graphs, we put forward a multi-viewclustering strategy based on simultaneous trace maximization (STM),which analyzes multi-view data through a multilinear perspective aswell. Third, a joint dimension reduction scheme based on tensordecomposition is presented, particularly for multi-view data. Thedimension reduction scheme is embedded into the STM based multi-viewclustering strategy, which enables us to handle large-scalemulti-view data. In the second part, we investigate text mining to extract multi-viewheterogeneous data from a large-scale publication database of Web ofScience (WoS). In order to facilitate the scientific mapping that isuseful for monitoring and detecting new trends in differentscientific fields, hybrid clustering, either in vector spaces or ingraph spaces, is carried out to integrate these multi-view data.Regarding hybrid clustering in vector spaces, various methodologiesare included in a unified framework, which consists of two generalapproaches: clustering ensemble and kernel fusion. A mutualinformation based weighting scheme is proposed to leverage the effectof multiple data sources in hybrid clustering. Concerning hybridclustering in graph spaces, various graphs are generated frommulti-view data. Utilizing the complementary properties of both textgraph and citation graph, we present a hybrid strategy named graphcoupling. Meanwhile, based on the modularity optimization, our graphcoupling strategy detects the number of clusters automatically andprovides a top-down hierarchical analysis, which fits in with thepractical applications. In addition, the computation of thismodularity based hybrid clustering method is so efficient that itdoes well in partitioning large-scale data. In the third part, we propose a novel strategy to derive knowledge fromtextual information from a multi-view perspective. The multiple viewscan be different controlled vocabularies, term weighting schemes,publishing time periods and biomedical subjects. Our strategy hasbeen applied to the MEDLINE corpus and analyzed using a disease baseddata set. In particular, we investigate the effect of combiningmultiple views for clustering and assessed whether vertical searchescan be more accurate for specific biological questions. Moreover, aWeb application of our multi-view text mining strategy is developedfor gene retrieval.

Related Organizations
Keywords

SISTA, STADIUS-11-163

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green