Ulrich Schimmack has a paper in press at Psychological Methods that should be required reading for anyone producing or consuming research in soft psychology (Title: “The Ironic Effect of Significant Results on the Credibility of Multiple-Study Articles”). Sadly, I doubt this paper will get much attention in the popular press. Uli argues that issues of statistical power are critical for evaluating a package of studies and his approach also fits very nicely with recent papers by Gregory Francis. I am excited because it seems as if applied researchers are beginning to have access to a set of relatively easy to use tools to evaluate published papers.
(I would add that Uli’s discussion of power fits perfectly well with broader concerns about the importance of study informativeness as emphasized by Geoff Cumming in his recent monograph.)
Uli makes a number of recommendations that have the potential to change the ratio of fiction to non-fiction in our journals. His first recommendation is to use power to explicitly evaluate manuscripts. I think this is a compelling recommendation. He suggests that authors need to justify the sample sizes in their manuscripts. There are too many times when I read papers and I have no clue why authors have used such small samples sizes. Such concerns do not lend themselves to positive impressions of the work.
Playing around with power calculations or power programs leads to sobering conclusions. If you expect a d-metric effect size of .60 for a simple two independent-groups study, you need 45 participants in each group (N=90) to have 80% power. The sample requirements only go up if the d is smaller (e.g., 200 total if d = .40 and 788 total if d = .20) or if you want better than 80% power. Given the expected value of most effect sizes in soft psychology, it seems to me that sample sizes are going to have to increase if the literature is going to get more believable. Somewhere, Jacob Cohen is smiling. If you hate NHST and want to think in terms of informativeness, that is fine as well. Bigger samples yield tighter confidence intervals. Who can argue with calls for more precision?
Uli discusses other strategies for improving research practices such as the value of publishing null results and the importance of rewarding the total effort that goes into a paper rather than the number of statistically significant p-values. It is also worth rewarding individuals and teams who are developing techniques to evaluate the credibility of the literature, actively replicating results, and making sure published findings are legitimate. Some want to dismiss them as witch hunters. I prefer to call them scientists.