Things that make me skeptical…

Simine Vazire crafted a thought provoking blog post about how some in the field respond to counter-intuitive findings.  One common reaction among critics of this kind of research is to claim that the results are unbelievable.   This reaction seems to fit with the maxim that extraordinary claims should require extraordinary evidence (AKA the Sagan doctrine).  For example, the standard of evidence needed to support the claim that a high-calorie/low nutrient diet coupled with a sedentary life style is negatively associated with morbidity might be different than the standard of proof needed to support the claim that attending class is positively associated with exam performance.  One claim seems far more extraordinary than the other.  Put another way: Prior subjective beliefs about the truthiness of these claims might differ and thus the research evidence needed to modify these pre-existing beliefs should be different.

I like the Sagan doctrine but I think we can all appreciate the difficulties that arise when trying to determine standards of evidence needed to justify a particular research claim.  There are no easy answers except for the tried and true response that all scientific claims should be thoroughly evaluated by multiple teams using strong methods and multiple operational definitions of the underlying constructs.  But this is a “long term” perspective and provides little guidance when trying to interpret any single study or package of studies.  Except that it does, sort of.  A long term perspective means that most findings should be viewed with a big grain of salt, at least initially.  Skepticism is a virtue (and I think this is one of the overarching themes of Simine’s blog posts thus far).   However, skepticism does not preclude publication and even some initial excitement about an idea.  It simply precludes making bold and definitive statements based on initial results with unknown generality.  More research is needed because of the inherent uncertainty of scientific claims. To quote a lesser known U2 lyric – “Uncertainty can be a guiding light”.

Anyways, I will admit to having the “unbelievable” reaction to a number of research studies.  However, my reaction usually springs from a different set of concerns rather than just a suspicion that a particular claim is counter to my own intuitions.  I am fairly skeptical of my own intuitions. I am also fairly skeptical of the intuitions of others.  And I still find lots of studies literally unbelievable.

Here is a partial list of the reasons for my skepticism. (Note: These points cover well worn ground so feel free to ignore if it sounds like I am beating a dead horse!)

1.  Large effect sizes coupled with small sample sizes.  Believe it or not, there is guidance in the literature to help generate an expected value for research findings in “soft” psychology.  A reasonable number of effects are between .20 and .30 in the r metric and relatively few are above .50 (see Hemphill, 2003; Richard et al., 2003).   Accordingly, when I read studies that generate “largish” effect size estimates (i.e., r ≥ |.40|), I tend to be skeptical.  I think an effect size estimate of .50 is in fact an extraordinary claim.

My skepticism gets compounded when the sample sizes are small and thus the confidence intervals are wide.  This means that the published findings are consistent with a wide range of plausible effect sizes so that any inference about the underlying effect size is not terribly constrained.  The point estimates are not precise. Authors might be excited about the .50 correlation but the 95% CI suggests that the data are actually consistent with anything from a tiny effect to a massive effect.  Frankly, I also hate it when the lower bound of the CI falls just slightly above 0 and thus the p value is just slightly below .05.  It makes me suspect p-hacking was involved.   (Sorry, I said it!)

2. Conceptual replications but no direct replications.  The multi-study package common to such prestigious outlets like PS or JPSP has drawn critical attention in the last 3 or so years.  Although these packages seem persuasive on the surface, they often show hints of publication bias on closer inspection.   The worry is that the original researchers actually conducted a number of related studies and only those that worked were published.   Thus, the published package reflects a biased sampling of the entire body of studies.  The ones that failed to support the general idea were left to languish in the proverbial file drawer.  This generates inflated effect size estimates and makes the case for an effect seem far more compelling than it should be in light of all of the evidence.  Given these issues, I tend to want to see a package of studies that reports both direct and conceptual replications.  If I see only conceptual replications, I get skeptical.  This is compounded when each study itself has a modest sample size with a relatively large effect size estimate that produces a 95% CI that gets quite close to 0 (see Point #1).

3. Breathless press releases.  Members of some of my least favorite crews in psychology seem to create press releases for every paper they publish.  (Of course, my perceptions could be biased!).  At any rate, press releases are designed by the university PR office to get media attention.  The PR office is filled with smart people trained to draw positive attention to the university using the popular media.  I do not have a problem with this objective per se.  However, I do not think this should be the primary mission of the social scientist.  Sometimes good science is only interesting to the scientific community.  I get skeptical when the press release makes the paper seem like it was the most groundbreaking research in all of psychology.  I also get skeptical when the press release draws strong real world implications from fairly constrained lab studies.  It makes me think the researchers overlooked the thorny issues with generalized causal inference.

I worry about saying this but I will put it out there – I suspect that some press releases were envisioned before the research was even conducted.  This is probably an unfair reaction to many press releases but at least I am being honest.  So I get skeptical when there is a big disconnect between the press release and the underlying research like when sweeping claims are made on a study of say 37 kids.  Or big claims about money and happiness are drawn from priming studies involving pictures of money.

I would be interested to hear what makes others skeptical of published claims.


A little background tangential to the main points of this post:

One way to generate press excitement is to quote the researcher(s) as being shocked by the results.  Unfortunately, I often think some of shock and awe expressed in these press releases is disingenuous.  Why?  Researchers designed the studies to test specific predictions in the first place.  So they had some expectations as to what they would find.  Alternatively, if someone did obtain a shocking initial result, they should conduct multiple direct replications to make sure the original result was not simply a false positive.  This kind of narrative is not usually part of the press release.

I also hate to read press releases that generalize the underlying results well beyond the initial design and purpose of the research.  Sometimes the real world implications of experiments are just not clear.  In fact, not all research is designed to have real world implications.  If we take the classic Mook reading at face value, lots of experimental research in psychology has no clear real world implications.   This is perfectly OK but it might make the findings less interesting to the general public.  Or at least it probably requires more background knowledge to make the implications interesting.  Such background is beyond the scope of the press release.



Author: mbdonnellan

Professor Social and Personality Psychology Texas A &M University

5 thoughts on “Things that make me skeptical…”

  1. Re: things that make me skeptical. This is related to both the conceptual replication issue and the multiple study papers that Brent describes, but since the late 90’s, my skepticism has increased in conjunction the the increase in papers where the intro to the studies seems to imply fairly straightforward designs (and sometimes, primarily main effects), but the umpteen studies reported are all fairly complicated, with one or more interactions, and there is little specific justification for the specific dependent variables chosen. Often, what would seem to be the most obvious tests of the effects that are actually predicted are strikingly absent.

    1. I completely agree. All of the time I see people predicting 3-way (or more) interactions as if the underlying theory is precise enough to make such a prediction and then detecting those interactions with severely underpowered studies. In the context of the multi-package study, those interactions are only demonstrated once, yet they are subsequently treated as if they are established. I would be willing to bet that pre-specified analyses are much (dare I say “significantly”) simpler than most of the published research. In the case of analyzing your data, I believe that simpler, but not too simple, is better.

      I think it is important that we recalibrate our expectations for what publications demonstrate. We should reward original research that looks more like the recent replication studies. Choose a well-defined effect that makes a modest contribution to the literature and study the hell out of it. Authors and readers should have confidence that if they followed the published methods that they would reproduce that effect.

  2. I feel skeptical when I see ambiguity in how constructs are operationalized as independent or dependent variables. For example, if a researcher purports to measure the effect of X on broad construct like self-affirmation, and then uses a DV that does not clearly seem to capture self-affirmation, or perhaps captures a blend of self-affirmation and another construct. The same could apply if an IV is operationalized such that it may capture a blend of constructs in addition to the construct purported to be the focus of the study. The most problematic implication of these types of examples is that many IVs or DVs were used in the study, and that only the one that worked was reported in the paper, with a neat story then constructed about why the IV or DV that worked represents the construct of interest. Even if ambiguity resulted from mere sloppiness, and not selective reporting, it can still create confusion when we interpret findings or apply them to the real world. If self-affirmation responds to a certain manipulation when it is operationalized with variable X, but not with variable Y, an apparent failure to replicate an effect involving “self-affirmation” is merely due to a difference in operationalization of that construct.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s