
arXiv: 1507.04155
Active learning aims to obtain a classifier of high accuracy by using fewer label requests in comparison to passive learning by selecting effective queries. Many active learning methods have been developed in the past two decades, which sample queries based on informativeness or representativeness of unlabeled data points. In this work, we explore a novel querying criterion based on statistical leverage scores. The statistical leverage scores of a row in a matrix are the squared row-norms of the matrix containing its (top) left singular vectors and is a measure of influence of the row on the matrix. Leverage scores have been used for detecting high influential points in regression diagnostics and have been recently shown to be useful for data analysis and randomized low-rank matrix approximation algorithms. We explore how sampling data instances with high statistical leverage scores perform in active learning. Our empirical comparison on several binary classification datasets indicate that querying high leverage points is an effective strategy.
4 pages, presented as contributed talk in ICML 2015 Active Learning Workshop
FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
