Reconsidering statistical methods for assessing replication.

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 2021 English Publisher:American Psychological Association (APA)Journal:Psychological Methods, volume 26, pages 127-139 (issn: 1082-989X, eissn: 1939-1463,

Copyright policy )

Authors: J. M. Schauer; L. V. Hedges;

doi: 10.1037/met0000302

pmid: 33617275

Reconsidering statistical methods for assessing replication.

- Summary
- Subjects
- Metrics

Abstract

Recent empirical evaluations of replication in psychology have reported startlingly few successful replication attempts. At the same time, these programs have noted that the proper way to analyze replication studies is far from a settled matter and have analyzed their data in several different ways. This presents 2 challenges to interpreting the results of these programs. First, different analysis methods assess different operational definitions of replication. Second, the properties of these methods are not necessarily common knowledge; it is possible for a successful replication to be deemed a failure by nearly all of the metrics used, and it is not always immediately clear how likely such errors are to occur. In this article, we describe the methods commonly used in replication research and how they imply specific operational definitions of replication. We then compute the probability of false failure (i.e., a successful replication is concluded to have failed) and false success determinations. These are shown to be high (often over 50%) and in many cases uncontrolled. We then demonstrate that errors are probable in the data to which these methods have been applied in the literature. We show that the probability that some reported conclusions about replication are incorrect can be as high as 75-80%. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

Related Organizations

Northwestern University
United States
Northwestern University
Philippines

Keywords

Data Interpretation, Statistical, Humans, Psychology, Reproducibility of Results

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	17
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%