<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Reproducibility in learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 09 Jun 2022Embargo end date: 01 Jan 2022Publisher:ACMJournal:Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing

Authors: Impagliazzo, Russell; Lei, Rex; Pitassi, Toniann; Sorrell, Jessica;

doi: 10.1145/3519935.3519973 , 10.48550/arxiv.2201.08430

arXiv: http://arxiv.org/abs/2201.08430

Reproducibility in learning

- Summary
- Subjects
- Metrics

Abstract

We introduce the notion of a reproducible algorithm in the context of learning. A reproducible learning algorithm is resilient to variations in its samples -- with high probability, it returns the exact same output when run on two samples from the same underlying distribution. We begin by unpacking the definition, clarifying how randomness is instrumental in balancing accuracy and reproducibility. We initiate a theory of reproducible algorithms, showing how reproducibility implies desirable properties such as data reuse and efficient testability. Despite the exceedingly strong demand of reproducibility, there are efficient reproducible algorithms for several fundamental problems in statistics and learning. First, we show that any statistical query algorithm can be made reproducible with a modest increase in sample complexity, and we use this to construct reproducible algorithms for finding approximate heavy-hitters and medians. Using these ideas, we give the first reproducible algorithm for learning halfspaces via a reproducible weak learner and a reproducible boosting algorithm. Finally, we initiate the study of lower bounds and inherent tradeoffs for reproducible algorithms, giving nearly tight sample complexity upper and lower bounds for reproducible versus nonreproducible SQ algorithms.

Related Organizations

University of California, San Diego
United States
King’s University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	15
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Green

hybrid

Fields of Science (4) View all

natural sciences

computer and information sciences

Fields of Science

natural sciences

computer and information sciences

View all