My View on the Connection between Theory and Direct Replication

I loved Simine’s blog post on flukiness and I don’t want to hijack the comments section of her blog with my own diatribe. So here it goes…

I want to comment on the suggestion that researchers should propose an alternative theory to conduct a useful or meaningful close/exact/direct replication. In practice, I think most replicators draw on the same theory that original authors used for the original study.  Moreover, I worry that people making this argument (or even more extreme variants) sometimes get pretty darn close to equating a theory with a sort of religion.  As in, you have to truly believe (deep in your heart) the theory or else the attempt is not valid.  The point of a direct replication is to make sure the results of a particular method are robust and obtainable by independent researchers.

My take:

Original authors used Theory P to derive Prediction Q (If P then Q). This is the deep structure of the Introduction of their paper.  They then report evidence consistent with Q using a particular Method (M) in the Results section.

A replicator might find the theoretical reasoning more or less plausible but mostly just think it is a good idea to evaluate whether repeating M yields the same result (especially if the original study was underpowered).* The point of the replication is to redo M (and ideally improve on it using a larger N to generate more precise parameter estimates) to test Prediction Q.  Some people think this is a waste of time.  I do not.

I don’t see how what is inside the heads of the replicators in terms of their stance about Theory P or some other Theory X as relevant to this activity. However, I am totally into scenarios that approximate the notion of a critical test whereby we have two (or more) theories that make competing predictions about what should be observed.  I wish there were more cases like that to talk about.

* Yes, I know about the hair splitting diatribes people go through to argue that you literally cannot duplicate the exact same M to test the same prediction Q in a replication study (i.e., the replication is literally impossible argument). I find that argument simply unsatisfying. I worry that this kind of argument slides into some postmodernist view of the world  in which there is no point in doing empirical research (as I understand it).


Author: mbdonnellan

Professor Social and Personality Psychology Texas A &M University

11 thoughts on “My View on the Connection between Theory and Direct Replication”

  1. In general I totally agree, although I find some variants of this kind of thinking more acceptable than others. For example in RCTs comparing psychotherapies, it is ideal to have PIs who differ in which treatment they are betting on and therapists who believe the approach they are using is best.

  2. I agree CH. In that case, I think manipulating expectancies is highly illuminating and practically useful. I would also love conditions whereby some of the therapists are sort of like well-intentioned robots. Willing to follow the treatment manuals in good faith but without theoretical commitments of their own.

  3. Reply

    Very brief reply only as I’m in the pub 😛

    I think the intention of “replicators” matters mainly for two reasons:

    1. It is very easy to fail to replicate because you are just doing it poorly. Unlike p-hacking which should affect original and replication studies more or less equally, doing an experiment poorly is unlikely to produce a convincing false positive but it can easily result in a false negative. If your intention is to show that the original finding is a fluke, you are in danger of doing it poorly. This doesn’t mean you will but I certainly think it is the intention of many replicants to show a null result.

    2. The more important reason is that it is just not very informative. For a single positive result, especially a weak one like in Simine’s example, your prior that it is a fluke should be strong anyway. So we are not learning anything new about the universe from this direct replication. We are none the wiser except that if we spent (wasted?) a lot of time on replicating this one finding we may think that it is a fluke. But we already expected that anyway and we can still not be very sure of it.

    Now I actually think direct replications have value when it comes to revealing the hidden moderators people often talk about. A failed direct replication should be used to generate hypotheses about hidden factors and then you can go and test them. That would again be great experimental design (even if you find lots of nulls except for your sanity checks). Unfortunately almost no muggle is doing that!

    1. Hi Sam (Nice to meet you BTW):

      1. The point of a direct replication is to follow the M as closely as possible and thus to do the study no more poorly than the original authors (and ideally better by collecting data from more participants). I would hope that most researchers strive to design experiments so that the resulting data bear no signature of their intentions (a line I think I am stealing from Kruschke). This is one reason why RAs are often blind to hypotheses and that they are the ones who usually have contact with participants. I personally never set out to generate biased data because I think doing so is highly unethical. It wastes the time of participants, RAs, and any stakeholders in the research process (see e.g., Rosenthal, 1994; Psychological Science).

      2. I disagree. In most cases, the original studies have not been replicated so information is gained with a larger N direct replication. And I do not understand how the field would ever be able to separate signal from noise if people did not try to replicate findings. I do confess to have dust bowl empiricist tendencies, however. Thus, I think reasonable people can disagree here.

      In the end I am sympathetic to Simine’s main point. Sampling error is real and it can be spectacular. I do think there would be more progress and perhaps even “reconciliation” in the ongoing discussions about replication if people focused less on intentions and more on methods.

      1. There is no question whether we should replicate. Replication is central to science. Of course we should replicate. My main point is that we should replicate all the time not just as a specific replication attempt.

        *Every* experiment you do should contain a replication unless it’s a brand new idea, which is really rare. This is how science works. It builds on previous research. Whether it is as a sanity check or as an attempt to replicate or as a baseline measure it is simply good experimental design. Basically it is the rat example Feynman uses in his famous lecture. To me current replication efforts are too heavy on Cargo Cult and too light on science.

  4. Re: “Yes, I know about the hair splitting diatribes people go through to argue that you literally cannot duplicate the exact same M to test the same prediction Q in a replication study” – We need to stop talking about “replication” or even “redo[ing] M” and start talking about “additional tests of the theory”. There is no reason why the first experiment is the only possible test of the theory, and certainly the theories are never stated as “in this lab, at this time, this will happen”. As far as anyone can tell, a “replication” is just as good a test of the theory as the original experiment — maybe better — even if it isn’t exactly a replication. The “cannot duplicate” argument is a red herring because it misses the point of replication completely. It is not to “replicate” for the sake of replicating. It is to test the theory, which is precisely the same purpose as the first experiment. The fact that some has tested it previously is irrelevant (*but see below).

    This puts the burden of proof squarely back on the claimants of the theory to explain why the second test is not just as good a test of the theory as the first, rather than on the “replicators” to explain why the two tests disagreed (if they do). Putting the burden of proof on the “replicators” implies that, for some reason, the *order* in which the experiments were done is relevant.

    (*If anything, the first experiment in a series should be doubted, since the fact that it was replicated means that it has a large enough effect size to get attention, and is therefore probably an over-estimate.)

    1. It is comments like this that prove to me that I agree with you more often than I don’t. My main problem with this whole discussion is that it is focused too much on effects, and not enough on theories.

  5. I’m not convinced that it’s appropriate to go from “Original authors used Theory P to derive Prediction Q” to full-on, modus ponens, “(If P then Q).” In psychology, especially, there is often so much uncertainty that we are currently at (or remotely close to) P, or indeed that Q has actually occurred, that all we can talk about is a balance of probabilities, often with a great deal of crud surrounding the measurement of everything.

    However, there seems to be a human tendency to attribute an unwarranted amount of credibility to the first people to occupy a new space. I’ll give an example from outside science, but which I think illustrates what I mean. Some friends of ours got new neighbours in their apartment block. Three days after the new neighbours moved in, our friends got a visit from the police: “There’s been a complaint that you’re making too much noise all night”. Uh-huh… (these friends of ours go to bed at 10pm.) Two days later, another visit. Next day, another visit. WTF? Then the noise started, from the new neighbours, until 5am. Our friends called the police… “Oh yeah, trying to deflect attention from your own noisy behaviour, huh?”. It took months, and calls from three other neighbours, to finally convince the authorities where the problem really lay (and even then, I’m guessing that notes remained in my friends’ file). It turned out that the new neighbours had been kicked out of their own apartment for making too much noise so they had worked out this new strategy.

    In summary: The first published claim will get way more attention than it deserves, even if it’s true; all following authors with contrary views will be regarded with suspicion simply because they are not first. (In fact, I would argue that the findings of the earlier authors should be given *less* weight, since they had way more degrees of freedom.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s