A Partial Defense of the Pete Rose Rule

I tweeted this yesterday: Let’s adopt a Pete Rose Rule for fakers = banned for life.  Nothing questionable about fraud.  Jobs and funds are too scarce for 2nd chances.

My initial thought was that people who have been shown by a preponderance of the evidence to have passed faked datasets as legitimate should be banned from receiving grants and publishing papers for life.   [Pete Rose was a baseball player and manager in professional baseball who bet on games when he was a manager. This made him permanently ineligible to participate in the activities of professional baseball.]

Nick Brown didn’t like this suggestion and provided a thoughtful response on his blog.  My post is an attempt to defend my initial proposal. I don’t want to hijack his comments with a lengthy rejoinder. You can get banned for life from the Olympics for doping so I don’t think it is beyond the pale to make the same suggestion for science.  As always, I reserve the right to change my mind in the future!

At the outset, I agree with his suggestion that it is not 100% feasible given that there is no overall international governing body for scientific research like there is for professional sports or the Olympics. However, the research world is often surprisingly small and I think it would be possible to impose an informal ban that would stick. And I think it could be warranted because faking data is exceptionally damaging to science. I also think it is rare so perhaps it is not worth thinking about too much.

Fakers impose huge costs on the system.  First, they make journals and scientists look bad in the eyes of the public. This is unfortunate because the “public” ultimately pays for a considerable amount of scientific research.  Faked data undermine public confidence in scientists and this often bleeds over into discussions about unrelated issues such as climate change or whether vaccines cause autism.  Likewise, as Dr. Barr pointed out in a comment on Nick’s blog, there is a case to be made for taking legal action for fraud in some cases.

Second, it takes resources to investigate the fakers. At the risk of speaking in broad generalities, I suspect that huge amounts of time are invested when it comes to the investigation of faked data. It takes effort to evaluate the initial charge and then determine what was and was not faked for people with long CVs. Efforts also need to be expended determine whether co-authors were innocent or co-conspirators.  This is time and money NOT spent on new research, teaching students, reviewing papers, etc.

Third, fakers impose costs on their peers.  Academics is a competitive enterprise.  We are judged by the number and quality of our work.  I suspect it is much easier to pump out papers based on fake data than real data.  This matters because there are limited numbers of positions and grant dollars.  A grad student faker who gets a paper in say Science will have a huge advantage on the job market.  There are far more qualified people than university positions.  Universities that have merit-based systems end up paying superstars more than mere mortals.  A superstar who faked her/his/their way to an impressive CV could easily have a higher salary than an honest peer who can’t compete with faked data.  Likewise, fakers cause their peers to waste limited resources when researchers attempt to extend (or just replicate) interesting results.

To my mind, faking data is the worst crime in science because it undermines the integrity of the system.  Thus, I believe that it warrants a serious punishment once it is established after a thorough judicial process or a confession.  You might think a lifetime ban is too severe but I am not so sure.

Moreover, let’s say the field decides to let a faker back in the “game” after some kind of rehabilitation.  Is this wise? I worry that it would impose additional and never-ending costs on the system.  The rehabilitated faker is going to continue to drain the system until retirement. For example, it would cost resources to double-check everything she or he does in the future.  How am I supposed to treat a journal submission from a known faker? It would require extra effort, additional reviews, and a lot of teeth gnashing. I would think a paper from a faker would need to independently replicated before it was taken seriously (I think this is true of all papers, but that is a topic for another day).  Why should a known faker get grants when so many good proposals are not funded because of a lack of resources? Would you trust a rehabilitated faker to train grad students in your program?

So my solution is to kick the “convicted” faker out of the game forever.  There are lots of talented and bright people who can’t get into the game as it stands.  There are not enough resources to go around for deserving scientists who don’t cheat.  I know that I would personally never vote to hire a faker in my department.

But I am open-minded and I know it sounds harsh. I want to thank Nick for forcing me to think more about this. Comments are welcome!

Replication Project in Personality Psychology – Call for Submissions

Richard Lucas and I are editing a special issue of the Journal of Research in Personality dedicated to replication (Click here for complete details). This blog post describes the general process and a few of my random thoughts on the special issue. These are my thoughts and Rich may or may not share my views.  I also want to acknowledge that there are multiple ways of doing replication special issues and we have no illusions that our approach is ideal or uncontroversial.  These kinds of efforts are part of an evolving “conversation” in the field about replication efforts and experimentation should be tolerated.  I also want to make it clear that JRP has been open to replication studies for several years.  The point of the special issue is to actively encourage replication studies and try something new with a variant of pre-registration.

What is the General Process?

We modeled the call for papers on procedures others have used with replication special issues and registered reports (e.g., the special issue of Social Psychology, the Registered Replication Reports at PoPS).  Here is the gist:

  • Authors will submit proposals for replication studies by 1 July 2015. These extended abstracts will be screened for methodological rigor and the importance of the topic.
  • Authors of selected proposals will then be notified by 15 August 2015.
  • There is a deadline of 15 March 2016 to submit the finished manuscript.

We are looking to identify a set of well-designed replication studies that provide valuable information about findings in personality psychology (broadly construed). We hope to include a healthy mix of pre-registered direct replications involving new data collections (either by independent groups or adversarial collaborations) and replications using existing datasets for projects that are not amenable to new data collection (e.g., long-term longitudinal studies).  The specific outcome of the replication attempt will not be a factor in selection.  Indeed, we do not want proposals to describe the actual results!

Complete manuscript will be subjected to peer-review but the relevant issues will be adherence to the proposed research plan, the quality of the data analysis, and the reasonableness of the interpretations.  For example, proposing to use a sample size of 800 but submitting a final manuscript with 80 participants will be solid grounds for outright rejection.  Finding a null result after a good faith attempt that was clearly outlined before data collection will not be grounds for rejection.  Likewise, learning that a previously used measure had subpar psychometric properties in a new and larger sample is valuable information even if it might explain a failure to find predicted effects.  At the very least, such information about how measures perform in new samples provides important technical insights.

Why Do This?

Umm, replication is an important part of science?!?! But beyond that truism, I am excited to learn what happens when we try to organize a modest effort to replicate specific findings in personality psychology. Personality psychologists use a diverse set of methods beyond experiments such as diary and panel studies.  This creates special challenges and opportunities when it comes to replication efforts.  Thus, I see this special issue as a potential chance to learn how replication efforts can be adapted to the diverse kinds of studies conducted by personality researchers.

For example, multiple research groups might have broadly similar datasets that target similar constructs but with specific differences when it comes to the measures, timing of assessments, underlying populations, sample sizes, etc. This requires careful attention to methodological similarities and differences when it comes to interpreting whether particular findings converge across the different datasets.  It would be ideal if researchers paid some attention to these issues before the results of the investigations were known.  Otherwise, there might be a tendency to accentuate differences when results fail to converge. This is one of the reasons why we will entertain proposals that describe replication attempts using existing datasets.

I also think it is important to address a perception that Michael Inzlicht described in a recent blog post.  He suggested that some social psychologists believe that some personality psychologists are using current controversies in the field as a way to get payback for the person-situation debate.  In light of this perception, I think it is important for more personality researchers to engage in formal replication efforts of the sort that have been prominent in social psychology.  This can help counter perceptions that personality researchers are primarily interested in schadenfreude and criticizing our sibling discipline. Hopefully, the cold war is over.

[As an aside, I think it the current handwringing about replication and scientific integrity transcends social and personality psychology.  Moreover, the fates of personality and social psychology are intertwined given the way many departments and journals are structured.  Social and personality psychology (to the extent that there is a difference) each benefit when the other field is vibrant, replicable, and methodologically rigorous.  Few outside of our world make big distinctions between social and personality researchers so we all stand to lose if decision makers like funders and university administrators decide to discount the field over concerns about scientific rigor.]

What Kinds of Replication Studies Are Ideal?

In a nut-shell: High quality replications of interesting and important studies in personality psychology.  To offer a potentially self-serving example, the recent replication of the association between I-words and narcissism is a good example.  The original study was relatively well-cited but it was not particularly strong in terms of sample size.  There were few convincing replications in the literature and it was often accepted as an article of faith that the finding was robust.  Thus, there was value in gaining more knowledge  about the underlying effect size(s) and testing to see whether the basic finding was actually robust.  Studies like that one as well as more modest contributions are welcome.  Personally, I would like more information about how well interactions between personality attributes and experimental manipulations tend to replicate especially when the original studies are seemingly underpowered.

What Don’t You Want to See?

I don’t want to single out too many specific topics or limit submissions but I can think of a few topics that are probably not going to be well received.  For instance, I am not sure we need to publish tons of replications showing there are 3 to 6 basic trait domains using data from college students.  Likewise, I am not sure we need more evidence that skilled factor analysts can find indications of a GFP (or general component) in a personality inventory.  Replications of well-worn and intensely studied topics are not good candidates for this special issue. The point is to get more data on interesting and understudied topics in personality psychology.

Final Thought

I hope we get a number of good submissions and the field learns something new in terms of specific findings. I also hope we also gain insights about the advantages and disadvantages of different approaches to replication in personality psychology.

My View on the Connection between Theory and Direct Replication

I loved Simine’s blog post on flukiness and I don’t want to hijack the comments section of her blog with my own diatribe. So here it goes…

I want to comment on the suggestion that researchers should propose an alternative theory to conduct a useful or meaningful close/exact/direct replication. In practice, I think most replicators draw on the same theory that original authors used for the original study.  Moreover, I worry that people making this argument (or even more extreme variants) sometimes get pretty darn close to equating a theory with a sort of religion.  As in, you have to truly believe (deep in your heart) the theory or else the attempt is not valid.  The point of a direct replication is to make sure the results of a particular method are robust and obtainable by independent researchers.

My take:

Original authors used Theory P to derive Prediction Q (If P then Q). This is the deep structure of the Introduction of their paper.  They then report evidence consistent with Q using a particular Method (M) in the Results section.

A replicator might find the theoretical reasoning more or less plausible but mostly just think it is a good idea to evaluate whether repeating M yields the same result (especially if the original study was underpowered).* The point of the replication is to redo M (and ideally improve on it using a larger N to generate more precise parameter estimates) to test Prediction Q.  Some people think this is a waste of time.  I do not.

I don’t see how what is inside the heads of the replicators in terms of their stance about Theory P or some other Theory X as relevant to this activity. However, I am totally into scenarios that approximate the notion of a critical test whereby we have two (or more) theories that make competing predictions about what should be observed.  I wish there were more cases like that to talk about.

* Yes, I know about the hair splitting diatribes people go through to argue that you literally cannot duplicate the exact same M to test the same prediction Q in a replication study (i.e., the replication is literally impossible argument). I find that argument simply unsatisfying. I worry that this kind of argument slides into some postmodernist view of the world  in which there is no point in doing empirical research (as I understand it).

How Do You Feel When Something Fails To Replicate?

Short Answer: I don’t know, I don’t care.

There is an ongoing discussion about the health of psychological science and the relative merits of different research practices that could improve research. This productive discussion occasionally spawns a parallel conversation about the “psychology of the replicators” or an extended mediation about their motives, emotions, and intentions. Unfortunately, I think that parallel conversation is largely counter-productive. Why? We have limited insight into what goes on inside the minds of others. More importantly, feelings have no bearing on the validity of any result. I am a big fan of this line from Kimble (1994, p. 257): How you feel about a finding has no bearing on its truth.

A few people seem to think that replicators are predisposed to feeling ebullient (gleeful?) when they encounter failures to replicate. This is not my reaction. My initial response is fairly geeky.  My impulse is to calculate the effect size estimate and precision of the new study to compare to the old study. I do not get too invested when a small N replication fails to duplicate a large N original study. I am more interested when a large N replication fails to duplicate a small N original study.

I then look to see if the design was difficult to implement or fairly straightforward to provide a context for interpreting the new evidence. This helps to anticipate the reactions of people who will argue that replicators lacked the skill and expertise to conduct the study or that their motivations influenced the outcome.  The often vague “lack of expertise” and “ill-intentioned” arguments are more persuasive when critics offer a plausible account of how these factors might have biased a particular replication effort.  This would be akin to offering an alternative theory of the crime in legal proceedings. In many cases, it seems unlikely that these factors are especially relevant. For example, a few people claimed that we lacked the expertise to conduct survey studies of showering and loneliness but these critics failed to offer a well-defined explanation for our particular results besides just some low-level mud-slinging. A failure to detect an effect is not prima facie evidence of a lack of expertise.

After this largely intellectual exercise is concluded, I might experience a change in mood or some sort of emotional reaction. More often this amounts to feelings of disappointment about the quality of the initial study and some anxiety about the state of affairs in the field (especially if the original study was of the small N, big effect size variety). A larger N study holds more weight than smaller N study.  Thus, my degree of worry scales with the sample size of the replication.  Of course, single studies are just data points that should end up as grist for the meta-analytic mill.  So there might be some anticipation over the outcome of future studies to learn what happens in yet another replication attempt.

Other people might have different modal emotional reactions. But does it matter?  And does it have anything at all to do with the underlying science or the interpretation of the replication?  My answers are No, No, and No. I think the important issues are the relevant facts – the respective sample sizes, effect size estimates, and procedures.

(Hopefully) The Last Thing We Write About Warm Water and Loneliness

Our rejoinder to the Bargh and Shalev response to our replication studies has been accepted for publication after peer-review. The Bargh and Shalev response is available here. A pdf of our rejoinder is available here.  Here are the highlights of our piece:

  1. An inspection of the size of the correlations from their three new studies suggests their new effect size estimates are closer to our estimates than to those reported in their 2012 paper. The new studies all used larger sample sizes than the original studies.
  2. We have some concerns about the validity of the Physical Warmth Extraction Index and we believe the temperature item is the most direct test of their hypotheses. If you combine all available data and apply a random-effects meta-analytic model, the overall correlation is .017 (95% CI = -.02 to .06 based on 18 studies involving 5,285 participants).
  3. We still have no idea why 90% of the participants in their Study 1a responded that they took less than 1 shower/bath per week. No other study using a sample from the United States even comes close to this distribution. Given this anomaly, we think results from Study 1a should be viewed with extreme caution.
  4. Acquiring additional data from outside labs is probably the most constructive step forward. Additional cross-cultural data would also be valuable.

This has been an interesting adventure and we have learned a lot about self-reported bathing/showering habits. What more could you ask for?


Is Obama a Narcissist?

Warning: For educational purposes only. I am a personality researcher not a political scientist!

Short Answer: Probably Not.

Longer Answer: There has been a fair bit of discussion about narcissism and the current president (see here for example). Some of this stemmed from recent claims about his use of first person pronouns (i.e., a purported use of greater “I-talk”). A big problem with that line of reasoning is that the empirical evidence linking narcissism with I-talk is surprisingly shaky.  Thus, Obama’s use of pronouns is probably not very useful when it comes to making inferences about his levels of narcissism.

Perhaps a better way to gauge Obama’s level of narcissism is to see how well his personality profile matches a profile typical of someone with Narcissistic Personality Disorder (NPD).  The good news is that we have such a personality profile for NPD thanks to Lynam and Widiger (2001).  Those researchers asked 12 experts to describe the prototype case of NPD in terms of the facets of the Five-Factor Model (FFM). In general, they found that someone with NPD could be characterized as having the following characteristics…

High Levels: Assertiveness, Excitement Seeking, Hostility, and Openness to Actions (i.e., a willingness to try new things)

Low Levels: Agreeableness (all aspects), Self-Consciousness, Warmth, Openness to Feelings (i.e., a lack of awareness of one’s emotional state and some elements of empathy)

The trickier issue is finding good data on Obama’s actual personality. My former students Edward Witt and Robert Ackerman did some research on this topic that can be used as a starting point.  They had 86 college students (51 liberals and 35 conservatives) rate Obama’s personality using the same dimensions Lynam and Widiger used to generate the NPD profile.  We can use the ratings of Obama averaged across the 86 different students as an informant report of his personality.

Note: I know this approach is far from perfect and it would be ideal to have non-partisan expert raters of Obama’s personality (specifically the 30 facets of the FFM). If you have such a dataset, send it my way (self-reported data from the POTUS would be welcome too)! Moreover, Witt and Ackerman found that liberals and conservatives had some differences when it came to rating Obama’s personality.  For example, conservatives saw him higher in hostility and lower in warmth than liberals.  Thus, the profile I am using might tend to have a rosier view of Obama’s personality than a profile generated from another sample with more conservatives (send me such a dataset if you have it!). An extremely liberal sample might generate an even more positive profile than what they obtained.

With those caveats out of the way, the next step is simple: Calculate the Intraclass Correlation Coefficient (ICC) between his informant-rated profile and the profile of the prototypic person with NPD. The answer is basically zero (ICC = -.08; Pearson’s r = .06).  In short, I don’t think Obama fits the bill of the prototypical narcissist. More data are always welcome but I would be somewhat surprised if Obama’s profile matched well with the profile of a quintessential narcissist in another dataset.

As an aside, Ashley Watts and colleagues evaluated levels of narcissism in the first 43 presidents and they used historical experts to rate presidential personalities. Their paper is extremely interesting and well worth reading. They found these five presidents had personalities with the highest relative approximation to the prototype of NPD: LBJ, Nixon, Jackson, Johnson, and Arthur.  The five lowest presidents were Lincoln, Fillmore, Grant, McKinley, and Monroe. (See Table 4 in their report).

Using data from the Watts et al. paper, I computed standardized scores for the estimates of Obama’s grandiose and vulnerable narcissism levels from the Witt and Ackerman profile. These scores indicated Obama was below average by over .50 SDs for both dimensions (Grandiose: -.70; Vulnerable: -.63).   The big caveat here is that the personality ratings for Obama were provided by undergrads and the Watts et al. data were from experts.  Again, however, there were no indications that Obama is especially narcissistic compared to the other presidents.

Thanks to Robert Ackerman, Matthias Mehl, Rich Slatcher, Ashley Watts, and Edward Witt for insights that helped with this post.

Postscript 1:  This is light hearted post.  However, the procedures I used could make for a fun classroom project for Personality Psychology 101.  Have the students rate a focal individual such as Obama or a character from TV, movies, etc. and then compare the consensus profile to the PD profiles. I have all of the materials to do this if you want them.  The variance in the ratings across students is also potentially interesting.

Postscript 2: Using this same general procedure, Edward Witt, Christopher Hopwood, and I concluded that Anakin Skywalker did not strongly match the profile of someone with BPD and neither did Darth Vader (counter to these speculations).  They were more like successful psychopaths.  But that is a blog post for another day!

Silly Questions to Ask Children

I have been working on a project designed to measure a certain individual difference in children as early as 5 years of age. There are a number of concerns about the use of self-reports with young children so this has been an overarching concern in this project. To partially address this issue, we came up with a handful of items that would be useful for detecting unusual responses in children. These items might be used to identify children who did not understand how to use the response scale or flag children who were giving responses that would be considered invalid.  There is a cottage industry of these kinds of scales for adult personality inventories but fewer options for kids.  (And yes I know about those controversies in the literature over these kinds of scales.)

Truth be told, I like writing items and I think this is true for many researchers. I am curious about how people respond to all sorts of questions especially silly ones.  It is even better if the silly ones tap something interesting about personality or ask participants about dinosaurs.

Here are a few sample items:

1. How do you feel about getting shots from the doctor?

2. How do you feel about getting presents for your birthday?

And my favorite item ever….

3. How would you feel about being eaten by a T-Rex?

The fact that we have asked over 800 kids this last question is sort of ridiculous but it makes me happy. I predicted that kids should report negative responses for this one. This was true for the most part but 11.3% of the sample registered a positive response. In fact, the T-Rex item sparked a heated conversation in my household this morning. My spouse (AD) is a former school teacher and AD thought some kids might think it was cool to see a T-Rex. She thought it was a bad item. My youngest child (SD) thought it would be bad to be eaten by said T-Rex even if it was cool to see one in person. I think SD was on my side.

I have had enough controversy over the past few weeks so I wanted to move on from this breakfast conversation. Thus, I did what any sensible academic would do – I equivocated. I acknowledged that items usually reflect multiple sources of variance and all have some degree of error. I also conceded that this item might pick up on sensation seeking tendencies. There could be some kids who might find it thrilling to be eaten by a T-Rex.Then I took SD to school and cried over a large cup of coffee.

But I still like this item and I think most people would think it would suck to be eaten by a T-Rex. It might also be fun to crowd source the writing of additional items. Feel free to make suggestions.

PS: I want to acknowledge my two collaborators on this project – Michelle Harris and Kali Trzesniewski. They did all of the hard work collecting these data.