Peer‐to‐peer distributed text classifier learning in PADMINI

descriptionPublicationkeyboard_double_arrow_right Article 21 Aug 2012 English Publisher:WileyJournal:Statistical Analysis and Data Mining: The ASA Data Science Journal, volume 5, pages 446-462 (issn: 1932-1864, eissn: 1932-1872,

Copyright policy )

Authors: Xianshu Zhu; Tushar Mahule; Haimonti Dutta; Sugandha Arora; Hillol Kargupta; Kirk D. Borne;

doi: 10.1002/sam.11155

Peer‐to‐peer distributed text classifier learning in PADMINI

- Summary
- Subjects
- Metrics

Abstract

AbstractPopular Internet document repositories, such as online newspapers, digital libraries, and blogs store large amount of text and image data that are frequently accessed by large number of users. Users' input through collaborative commenting or tagging can be very useful in organizing and classifying documents. Some web sites (e.g. Google Image Labeler) support a collection of tags and labels, but a large fraction of these sites do not currently support such activities. Moreover, relying upon centrally controlled web‐service providers for such support is probably not a good idea if the objective is to make the collaborative inputs publicly available. Often, business entities offering such web‐based tagging environments end up owning and monetizing the result of the collective effort. This paper takes a step toward addressing this problem—it proposes a peer‐to‐peer (P2P) system (PADMINI), powered by distributed data mining algorithms. In particular, it focuses on learning a P2P classifier from tagged text data. This paper describes the PADMINI system and the distributed text classifier learning components; text classification is posed as a linear program and an asynchronous distributed algorithm is used to solve it. It also presents extensive empirical results on text data obtained from the Hubble Space Telescope (HST) proposal abstract database. Copyright © 2012 Wiley Periodicals, Inc.Statistical Analysis and Data Mining2012

Related Organizations

Columbia University
United States
George Mason University
United States
University of Maryland, Baltimore
United States
University of Maryland, Baltimore County
United States
King’s University
United States

Keywords

distributed data mining, peer-to-peer system, annotation, distributed linear programming, Statistics, collaborative tagging, Computer science

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average