On the construction of knockoffs in case–control studies

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2019Embargo end date: 01 Jan 2018 English Publisher:WileyJournal:Stat, volume 8 (issn: 2049-1573, eissn: 2049-1573,

Copyright policy )Funded by:NSF | Discovering What Matters:...

Authors: Rina Foygel Barber; Emmanuel Candès;

doi: 10.1002/sta4.225 , 10.48550/arxiv.1812.11433

arXiv: 1812.11433

On the construction of knockoffs in case–control studies

- Summary
- Subjects
- Metrics

Abstract

Consider a case–control study in which we have a random sample, constructed in such a way that the proportion of cases in our sample is different from that in the general population—for instance, the sample is constructed to achieve a fixed ratio of cases to controls. Imagine that we wish to determine which of the potentially many covariates under study truly influence the response by applying the new model‐X knockoffs approach. This paper demonstrates that it suffices to design knockoff variables using data that may have a different ratio of cases to controls. For example, the knockoff variables can be constructed using the distribution of the original variables under any of the following scenarios: (a) a population of controls only; (b) a population of cases only; and (c) a population of cases and controls mixed in an arbitrary proportion (irrespective of the fraction of cases in the sample at hand). The consequence is that knockoff variables may be constructed using unlabelled data, which are often available more easily than labelled data, while maintaining Type‐I error guarantees.

Related Organizations

View all View all

Keywords

FOS: Computer and information sciences, case-control studies, knockoffs, Statistics, Mathematics - Statistics Theory, Statistics Theory (math.ST), Methodology (stat.ME), FOS: Mathematics, false discovery rate, Statistics - Methodology

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average