Preliminary Thoughts about Guidelines and Recommendations for Exact Replications

Thanks to Chris Fraley and Fred Oswald for earlier comments on these ideas.

After the embarrassing methodological travesties of the last two years (e.g,. Bem’s publication of the ESP study in JPSP; the Big Three Fraudsters – Stapel, Smeesters; Sanna; Bargh’s Psychology Today rants), there is increased interest in replication studies.  This is a great development but there are some nuts and bolts issues that are important for conducting informative replications.  If the true population effect size is small and your replication study has a very small sample size, the replication attempt will not be very informative.

Thus, I started to think about a set of guidelines for designing exact (or near-exact) replication studies that might produce meaningful data.  I let this material sit on my desktop for months but I decided to post it here.

Three big issues have occurred to me

A. What counts as a replication?  A directional hit such that the new result is in the same direction as the original paper and statistically significant at p < .05 (or should it be .01 or .001)?  Or an effect size estimate that is in the ballpark of the original?  Some friends/colleagues of mine think the first outcome counts as a replication but I am not convinced.  Why? A trivial effect size will reach significance at p < .05 with a large enough sample size.  Let’s consider a real-life example. Bargh’s original walking study (experiment 2a) generated a d estimate of around 1.08 (N =30) in the published paper (computed from the reported t of 2.86 with df = 28, the mean difference between the two conditions was .98 seconds).   What is remarkable about Bargh et al. (1996) is probably the size of the effect.  (How many ds> 1.00 do you see in your work?). If I redo his study with 10,000 participants per condition and get a d-metric effect size estimate of .10 (p < .05), did I reproduce his results?  I don’t have the best answer for this question but I would prefer to count a replication as any study that obtains an effect size in the ballpark of the original study (to be arbitrary – say the 95% CIs overlap?).  This perspective leads to the next issue…

B. What kind of effect size estimate should researchers expect when planning the replication study?  I think Ioannidis is a tremendously smart person (e.g., 2008; Epidemiology) so I trust him when he argues that most discovered effect sizes are inflated.  Thus, I think researchers should expect some “shrinkage” in effect size estimates upon replication.  This unpleasant reality has consequences for study design.  Ultimately, I think a replication study should have a sample size that is equal to the original and preferably much larger.  A much smaller sample size than the original is not a good attribute of a replication study.

C. Do you address obvious flaws in the original?  Nearly all studies have flaws and sometimes researchers make inexplicable choices.  Do you try to fix these when conducting the replication study?  Say a group of researchers investigated the correlation between loneliness and taking warm showers/baths (don’t ask) and they decided to use only 10 out of 20 items on a well-established loneliness measure.  What do you do?  Use only their 10 items (if you could figure those out from the published report) or use the whole scale? My view is that you should use the full measure but that might mean that my new study is only a near-exact replication.  Fortunately, I can extract the 10 items from the 20 items so things are fine in this case.  Other examples with different IV/DVs might not be so easy to handle.

In light of those issues, I came up with these quick and dirty recommendations for simple experiments or correlational studies (replication studies when it is easy to identify a population correlation or mean-difference of interest).

1. Read the original study thoroughly and calculate effect size estimates if none are presented.   Get a little worried if the original effect size seems larger relative to other similar effect size estimates in the literature.  If you are clueless about expected effect sizes, get educated. (Cluelessness about expected effect sizes strikes me as major indicator of a poor psychological researcher).  Richard et al. (2003; Review of General Psychology) offer a catalogue of effect sizes in social psychology (the expected value might be around d of .40 or a correlation of .20 if I recall correctly).  Other sources are Meyer et al. (2001; American Psychologist) or Wetzels et al. (2011; Perspectives on Psychological Science – thanks to Tim Pleskac for the recommendation). Wetzel summarizes more experimental research in cognitive psychology.

2. In line with the above discussion and the apparent prevalence of questionable research practices/researcher degrees of freedom, expect that the published effect size estimate is positively biased from the true population value. Thus, you should attempt to collect a larger sample size for your replication study.  Do a series of simple power calculations assuming the population effect size is 90%, 75%, 50%, and 25%, and 10% of the published value.  Use those values to decide on the new sample size.  When in doubt, go large.  There is a point in which an effect is too small to care about but this is hard to know and it depends on a number of factors.  Think about the confidence interval around the parameter estimate of interest.  Smaller is better and a larger N is the royal road to smaller confidence intervals.

3. Consider contacting the original authors for their materials and procedures. Hopefully they are agreeable and send you everything.  If not, get nervous but do the best you can to use their exact measures from the published write-up. ***Controversial:  Note in the write-up if they ignored your good faith attempts to obtain their materials. If there was a stated reason for not helping you, inform readers of their reasons.  I think the community needs to know who is willing to facilitate replications and who is not. ***

4. Conduct the study with care.

5. Analyze the data thoroughly. Compute effect size estimates. Compare with the original.  Plan to share your dataset with the original authors so keep good documentation and careful notes.  (Actually you should plan to share your dataset with the entire scientific community, see Wicherts & Bakker [2012, Intelligence]).

6. Write up the results.  Try to strike an even-handed tone if you fail to replicate the published effect size estimate.  Chance is lumpy (Abelson) and no one knows the true population value.  Write as if you will send the paper to the original authors for comments.

7. Try to publish the replication or send it to the Psych File Drawer website (http://www.psychfiledrawer.org/).  The field has got to keep track of these things.

8. Take pride in doing something scientifically important even if other people don’t give a damn.  Replication is a critical scientific activity (Kline, 2004, p. 247) and it is time that replication studies are valued.