
doi: 10.1002/sam.11155
AbstractPopular Internet document repositories, such as online newspapers, digital libraries, and blogs store large amount of text and image data that are frequently accessed by large number of users. Users' input through collaborative commenting or tagging can be very useful in organizing and classifying documents. Some web sites (e.g. Google Image Labeler) support a collection of tags and labels, but a large fraction of these sites do not currently support such activities. Moreover, relying upon centrally controlled web‐service providers for such support is probably not a good idea if the objective is to make the collaborative inputs publicly available. Often, business entities offering such web‐based tagging environments end up owning and monetizing the result of the collective effort. This paper takes a step toward addressing this problem—it proposes a peer‐to‐peer (P2P) system (PADMINI), powered by distributed data mining algorithms. In particular, it focuses on learning a P2P classifier from tagged text data. This paper describes the PADMINI system and the distributed text classifier learning components; text classification is posed as a linear program and an asynchronous distributed algorithm is used to solve it. It also presents extensive empirical results on text data obtained from the Hubble Space Telescope (HST) proposal abstract database. Copyright © 2012 Wiley Periodicals, Inc.Statistical Analysis and Data Mining2012
distributed data mining, peer-to-peer system, annotation, distributed linear programming, Statistics, collaborative tagging, Computer science
distributed data mining, peer-to-peer system, annotation, distributed linear programming, Statistics, collaborative tagging, Computer science
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
