Testing convex truncation

descriptionPublicationkeyboard_double_arrow_right Article , Part of book or chapter of book , Preprint , Conference object 01 Jan 2023Embargo end date: 01 Jan 2023Publisher:European Mathematical Society - EMS - Publishing House GmbHJournal:Mathematical Statistics and Learning, volume 8, pages 1-31 (issn: 2520-2316, eissn: 2520-2324,

Copyright policy )Funded by:NSF | AF: Small: Collaborative ..., NSF | AF: Medium: The Trace Rec..., NSF | AF: Medium: Research in A... +5 projects

Authors: Anindya De; Shivam Nadimpalli; Rocco A. Servedio;

doi: 10.4171/msl/50 , 10.1137/1.9781611977554.ch155 , 10.48550/arxiv.2305.03146

arXiv: 2305.03146

Testing convex truncation

- Summary
- Subjects
- Metrics

Abstract

We study the basic statistical problem of testing whether normally distributed n -dimensional data has been truncated , i.e., altered by only retaining points that lie in some unknown truncation set S \subseteq \mathbb{R}^{n} . As our main algorithmic results, (1) we give an O(n) -sample algorithm that can distinguish the standard normal distribution N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown and arbitrary convex set S ; (2) we give a different O(n) -sample algorithm that can distinguish N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown and arbitrary mixture of symmetric convex sets . Both our algorithms are computationally efficient and run in O(n^{2}) time, which is linear in the size of the input. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially n^{O(\sqrt{n})} samples. An easy argument shows that no finite number of samples suffices to distinguish N(0,I_{n}) from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown symmetric convex set must use \Omega(n) samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

Related Organizations

Columbia University
University of Pennsylvania
United States
Massachusetts Institute of Technology
United States
Columbia University, Columbia University
Columbia University

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Computational Complexity, Computer Science - Data Structures and Algorithms, Probability (math.PR), FOS: Mathematics, Mathematics - Statistics Theory, Data Structures and Algorithms (cs.DS), Statistics Theory (math.ST), Computational Complexity (cs.CC), Mathematics - Probability

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

Green

gold

Funded byView all

NSF| AF: Small: Collaborative Research: Boolean Function Analysis Meets Stochastic Design, NSF| AF: Medium: The Trace Reconstruction Problem, NSF| AF: Medium: Research in Algorithms and Complexity: Total Functions, Games, and the Brain, NSF| CAREER: Learning and property testing -- a complexity theoretic perspective