Document clustering using locality preserving indexing

descriptionPublicationkeyboard_double_arrow_right Article 01 Dec 2005Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Knowledge and Data Engineering, volume 17, pages 1,624-1,637 (issn: 1041-4347,

Copyright policy )

Authors: Deng Cai 0001; Xiaofei He 0001; Jiawei Han 0001;

doi: 10.1109/tkde.2005.198

Document clustering using locality preserving indexing

- Summary
- Metrics

Abstract

We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using locality preserving indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on latent semantic indexing (LSI) or nonnegative matrix factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised linear discriminant analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets.

Related Organizations

University of Chicago
United States
University of Illinois System
United States
University of Illinois at Urbana Champaign
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	546
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

546

Top 0.1%

Top 10%

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering