Noisy Bayesian active learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object , Other literature type 01 Oct 2012Embargo end date: 01 Jan 2013Publisher:IEEEJournal:2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)Funded by:NSF | CIF: Small: Controlling U..., NSF | RI: Medium: Quantifying a..., NSF | Collaborative Research: N...

Authors: Mohammad Naghshvar; Tara Javidi; Kamalika Chaudhuri;

doi: 10.1109/allerton.2012.6483415 , 10.48550/arxiv.1312.2315

arXiv: 1312.2315

Noisy Bayesian active learning

- Summary
- Subjects
- Metrics

Abstract

We consider the problem of noisy Bayesian active learning, where we are given a finite set of functions $\mathcal{H}$, a sample space $\mathcal{X}$, and a label set $\mathcal{L}$. One of the functions in $\mathcal{H}$ assigns labels to samples in $\mathcal{X}$. The goal is to identify the function that generates the labels even though the result of a label query on a sample is corrupted by independent noise. More precisely, the objective is to declare one of the functions in $\mathcal{H}$ as the true label generating function with high confidence using as few label queries as possible, by selecting the queries adaptively and in a strategic manner. Previous work in Bayesian active learning considers Generalized Binary Search, and its variants for the noisy case, and analyzes the number of queries required by these sampling strategies. In this paper, we show that these schemes are, in general, suboptimal. Instead we propose and analyze an alternative strategy for sample collection. Our sampling strategy is motivated by a connection between Bayesian active learning and active hypothesis testing, and is based on querying the label of a sample which maximizes the Extrinsic Jensen-Shannon divergence at each step. We provide upper and lower bounds on the performance of this sampling strategy, and show that these bounds are better than previous bounds.

39 pages (one-column), 5 figures, submitted to IEEE Transactions on Information Theory

Related Organizations

University of California, San Diego
United States
University of California, San Francisco
United States

Keywords

FOS: Computer and information sciences, Optimization and Control (math.OC), Computer Science - Information Theory, Information Theory (cs.IT), FOS: Mathematics, 62L05, 62L10, 62B10, 62F03, 62F05, 93E35, 93E20, Mathematics - Statistics Theory, Statistics Theory (math.ST), Mathematics - Optimization and Control

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

8

Top 10%

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Funded by

NSF| CIF: Small: Controlling Uncertainty: On the Sequential Refinement of Belief, NSF| RI: Medium: Quantifying and utilizing confidence in machine learning, NSF| Collaborative Research: New Communication Infrastructures For Networked Coordinated Control