Learning by active nonlinear diffusion

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2019Embargo end date: 01 Jan 2019Publisher:American Institute of Mathematical Sciences (AIMS)Journal:Foundations of Data Science, volume 1, pages 271-291 (eissn: 2639-8001,

Copyright policy )

Authors: Mauro Maggioni; James M. Murphy;

doi: 10.3934/fods.2019012 , 10.48550/arxiv.1905.12989

arXiv: 1905.12989

Learning by active nonlinear diffusion

- Summary
- Subjects
- Metrics

Abstract

This article proposes an active learning method for high dimensional data, based on intrinsic data geometries learned through diffusion processes on graphs. Diffusion distances are used to parametrize low-dimensional structures on the dataset, which allow for high-accuracy labelings of the dataset with only a small number of carefully chosen labels. The geometric structure of the data suggests regions that have homogeneous labels, as well as regions with high label complexity that should be queried for labels. The proposed method enjoys theoretical performance guarantees on a general geometric data model, in which clusters corresponding to semantically meaningful classes are permitted to have nonlinear geometries, high ambient dimensionality, and suffer from significant noise and outlier corruption. The proposed algorithm is implemented in a manner that is quasilinear in the number of unlabeled data points, and exhibits competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images.

20 pages, 10 figures

Related Organizations

Johns Hopkins University
United States
Tufts University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, FOS: Mathematics, Mathematics - Statistics Theory, Machine Learning (stat.ML), Statistics Theory (math.ST), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	16
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

16

Top 10%

Green

gold

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all