One for the File Drawer?

I once read about an experiment in which college kids held either a cold pack or a warm pack and then reported about their levels of so-called trait loneliness. We just tried a close replication of this experiment involving the same short form loneliness scale used by the original authors. I won’t out my collaborators but I want to acknowledge their help.

The original effect size estimate was pretty substantial (d = .61, t = 2.12, df = 49) but we used 261 students so we could have more than adequate power. Our attempt yielded a much small effect size than the original (d =-.01, t = 0.111, df = 259, p = .912).  The mean of the cold group (2.10) was darn near the same as the warm group (2.11; pooled SD = .61).  (We also get null results if you restrict the analyses to just those who reported that they believed the entire cover story: d = -.17.  The direction is counter to predictions, however.)

Failures to replicate are a natural part of science so I am not going to make any bold claims in this post. I do want to point out that the reporting in the original is flawed. (The original authors used a no-pack control condition and found no evidence of a difference between the warm pack and the no-pack condition so we just focused on the warm versus cold comparison for our replication study).  The sample size was reported as 75 participants. The F value for the one-way ANOVA was reported as 3.80 and the degrees of freedom were reported as 2, 74.  The numerator for the reference F distribution should be k -1 (where k is the number of conditions) so the 2 was correct.  However, the denominator was reported as 74 when it should be N – k or 72 (75 – 3).   Things get even weirder when you try to figure out the sample sizes for the 3 groups based on the degrees of freedom reported for each of the three follow-up t-tests.

We found indications that holding a cold pack did do something to participants.  Both the original study and our replication involved a cover story about product evaluation. Participants answered three yes/no questions and these responses varied by condition.

Percentage answering “Yes” to the Pleasant Question:

Warm: 96%     Cold: 80%

Percentage answering “Yes” to the Effective Question:

Warm: 98%     Cold: 88%

Percentage answering “Yes” to the Recommending to a Friend Question:

Warm: 95%   Cold: 85%

Apparently, the cold packs were not evaluated as positively as the warm packs.  I can foresee all sorts of criticism coming our way. I bet one thread is that were are not “skilled” enough to get the effect to work and a second thread is that we are biased against the original authors (either explicitly or implicitly). I’ll just note these as potential limitations and call it good.  Fair enough?

Update 7 February 2014:  We decided to write this up for a journal article. In the process of preparing the manuscript and files for posting, Jessica noticed that I did not drop a participant with an ID we use for testing the survey system.  Thus, the actual sample size should be 260 NOT 261.  Fortunately, this did not change any of the conclusions.  The t statistic was -0.006 (df = 258), p = .995 and the effect size was d = -.01.  We also conducted a number of supplementary analyses to see if removing participants who expressed suspicion or had questionable values on the manipulation check variable (rating the temperature of the cold pack) impacted results.  Nothing we could do influenced the bottom line null result.

I caught my own mistake so I donated $20 to a charity I support – the American Cancer Society.