Is Obama a Narcissist?

Warning: For educational purposes only. I am a personality researcher not a political scientist!

Short Answer: Probably Not.

Longer Answer: There has been a fair bit of discussion about narcissism and the current president (see here for example). Some of this stemmed from recent claims about his use of first person pronouns (i.e., a purported use of greater “I-talk”). A big problem with that line of reasoning is that the empirical evidence linking narcissism with I-talk is surprisingly shaky.  Thus, Obama’s use of pronouns is probably not very useful when it comes to making inferences about his levels of narcissism.

Perhaps a better way to gauge Obama’s level of narcissism is to see how well his personality profile matches a profile typical of someone with Narcissistic Personality Disorder (NPD).  The good news is that we have such a personality profile for NPD thanks to Lynam and Widiger (2001).  Those researchers asked 12 experts to describe the prototype case of NPD in terms of the facets of the Five-Factor Model (FFM). In general, they found that someone with NPD could be characterized as having the following characteristics…

High Levels: Assertiveness, Excitement Seeking, Hostility, and Openness to Actions (i.e., a willingness to try new things)

Low Levels: Agreeableness (all aspects), Self-Consciousness, Warmth, Openness to Feelings (i.e., a lack of awareness of one’s emotional state and some elements of empathy)

The trickier issue is finding good data on Obama’s actual personality. My former students Edward Witt and Robert Ackerman did some research on this topic that can be used as a starting point.  They had 86 college students (51 liberals and 35 conservatives) rate Obama’s personality using the same dimensions Lynam and Widiger used to generate the NPD profile.  We can use the ratings of Obama averaged across the 86 different students as an informant report of his personality.

Note: I know this approach is far from perfect and it would be ideal to have non-partisan expert raters of Obama’s personality (specifically the 30 facets of the FFM). If you have such a dataset, send it my way (self-reported data from the POTUS would be welcome too)! Moreover, Witt and Ackerman found that liberals and conservatives had some differences when it came to rating Obama’s personality.  For example, conservatives saw him higher in hostility and lower in warmth than liberals.  Thus, the profile I am using might tend to have a rosier view of Obama’s personality than a profile generated from another sample with more conservatives (send me such a dataset if you have it!). An extremely liberal sample might generate an even more positive profile than what they obtained.

With those caveats out of the way, the next step is simple: Calculate the Intraclass Correlation Coefficient (ICC) between his informant-rated profile and the profile of the prototypic person with NPD. The answer is basically zero (ICC = -.08; Pearson’s r = .06).  In short, I don’t think Obama fits the bill of the prototypical narcissist. More data are always welcome but I would be somewhat surprised if Obama’s profile matched well with the profile of a quintessential narcissist in another dataset.

As an aside, Ashley Watts and colleagues evaluated levels of narcissism in the first 43 presidents and they used historical experts to rate presidential personalities. Their paper is extremely interesting and well worth reading. They found these five presidents had personalities with the highest relative approximation to the prototype of NPD: LBJ, Nixon, Jackson, Johnson, and Arthur.  The five lowest presidents were Lincoln, Fillmore, Grant, McKinley, and Monroe. (See Table 4 in their report).

Using data from the Watts et al. paper, I computed standardized scores for the estimates of Obama’s grandiose and vulnerable narcissism levels from the Witt and Ackerman profile. These scores indicated Obama was below average by over .50 SDs for both dimensions (Grandiose: -.70; Vulnerable: -.63).   The big caveat here is that the personality ratings for Obama were provided by undergrads and the Watts et al. data were from experts.  Again, however, there were no indications that Obama is especially narcissistic compared to the other presidents.

Thanks to Robert Ackerman, Matthias Mehl, Rich Slatcher, Ashley Watts, and Edward Witt for insights that helped with this post.

Postscript 1:  This is light hearted post.  However, the procedures I used could make for a fun classroom project for Personality Psychology 101.  Have the students rate a focal individual such as Obama or a character from TV, movies, etc. and then compare the consensus profile to the PD profiles. I have all of the materials to do this if you want them.  The variance in the ratings across students is also potentially interesting.

Postscript 2: Using this same general procedure, Edward Witt, Christopher Hopwood, and I concluded that Anakin Skywalker did not strongly match the profile of someone with BPD and neither did Darth Vader (counter to these speculations).  They were more like successful psychopaths.  But that is a blog post for another day!


More Null Results in Psychological Science — Comments on McDonald et al. (2014) and Crisp and Birtel (2014)

Full Disclosure:  I am second author on the McDonald et al. (2014) commentary.

Some of you may have seen that Psychological Science published our commentary on the Birtel and Crisp (2012) paper.  Essentially we tried to replicate two of their studies with larger sample sizes (29 versus 240 and 32 versus 175, respectively) and obtained much lower effect size estimates. It is exciting that Psychological Science published our work and I think this is a hint of positive changes for the field.  Hopefully nothing I write in this post undercuts that overarching message.

I read the Crisp and Birtel response and I had a set of responses (shocking, I know!). I think it is fair that they get the last word in print but I had some reactions that I wanted to share.  Thus, I will outlet a few in this blog post. Before diving into issues, I want to reiterate the basic take home message of McDonald et al. (2014):

“Failures to replicate add important information to the literature and should be a normal part of the scientific enterprise. The current study suggests that more work is needed before Birtel and Crisp’s procedures are widely implemented. Interventions for treating prejudice may require more precise manipulations along with rigorous evaluation using large sample sizes.” (p. xx)

1.  Can we get a mulligan on our title? We might want to revise the title of our commentary to make it clear that our efforts applied to only two specific findings in the original Birtel and Crisp (2012) paper. I think we were fairly circumscribed in the text itself but the title might have opened the door for how Crisp and Birtel (2014) responded.  They basically thanked us for our efforts and pointed out that our two difficulties say nothing about the entire imagined contact hypothesis.  They even argued that we “overgeneralized” our findings to the entire imagined contact literature.  To be frank, I do not think they were being charitable to our piece with this criticism because we did not make this claim in the text.  But titles are important and our title might have suggested some sort of overgeneralization.  I will let readers make their own judgments.  Regardless, I wish we had made the title more focused.

2.  If you really believe the d is somewhere around .35, why were the sample sizes so small in the first place?  A major substantive point in the Crisp and Birtel (2014) response is that the overall d for the imagined contact literature is somewhere around .35 based on a recent Miles and Crisp (2014) meta-analysis.  That is a reasonable point but I think it actually undercuts the Birtel and Crisp (2012) paper and makes our take home point for us (i.e., the importance of using larger sample sizes in this literature).  None of the original Birtel and Crisp (2012) studies had anywhere near the power to detect a population d of .35.  If we take the simple two-group independent t-test design, the power requirements for .80 suggest the need for about 260 participants (130 in each group).   The largest sample size in Birtel and Crisp (2012) was 32.

3. What about the ManyLabs paper?  The now famous ManyLabs paper of Klein et al. (in press) reports a replication attempt of an imagined contact study (Study 1 in Husnu & Crisp, 2010).  The ManyLabs effort yielded a much lower effect size estimate (d = .13, N = 6,336) than the original report (d = .86 or .84 as reported in Miles & Crisp, 2014; N = 33).  This is quite similar to the pattern we found in our work.  Thus, I think there is something of a decline effect in operation.  There is a big difference in interpretation between a d of .80 and a d around .15.  This should be worrisome to the field especially when researchers begin to think of the applied implications of this kind of work.

4. What about the Miles and Crisp Meta-Analysis (2014)? I took a serious look at the Miles and Crisp meta-analysis and I basically came away with the sinking feeling that much more research needs to be done to establish the magnitude of the imagined contact effects.  Many of the studies used in the meta-analysis were grossly underpowered.  There were 71 studies and only 2 had sample sizes above 260 (the threshold for having a good chance to detect a d = .35 effect using the standard between-participants design).  Those two large studies yielded basically null effects for the imagined contact hypothesis (d = .02 and .05, ns = 508 and 488, respectively). The average sample size of the studies in the meta-analysis was 81 (81.27 to be precise) and the median was 61 (Min. = 23 and Max. = 508).  A sample size of 123 was in the 90th percentile (i.e., 90% of the samples were below 123) and nearly 80% of the studies had sample sizes below 100.

Miles and Crisp (2014) were worried about sample size but perhaps not in the ways that I might have liked.   Here is what they wrote: “However, we observed that two studies had a sample size over 6 times the average (Chen & Mackie, 2013; Lai et al., 2013). To ensure that these studies did not contribute disproportionately to the summary effect size, we capped their sample size at 180 (the size of the next largest study) when computing the standard error variable used to weight each effect size.” (p. 13).  Others can weigh in about this strategy but I tend to want to let the sample sizes “speak for themselves” in the analyses, especially when using a random-effects meta-analysis model.

 What’s it all mean?

Not to bring out the cliché but I think much more work needs to be done here.  As it stands, I think the d = .35 imagined contact effect size estimate is probably upwardly biased.  Indeed, Miles and Crisp (2014) found evidence of publication bias such that unpublished studies yielded a smaller overall effect size estimate than published studies (but the unpublished studies still produce an estimate that is reliably larger than zero).  However this shakes out, researchers are well advised to use much larger sample sizes than tends to characterize this literature based on my summary of the sample sizes in Miles and Crisp (2014).  I also think more work needs to be done to evaluate the specific Birtel and Crisp (2012) effects.  We now have collected two more unpublished studies with even bigger sample sizes and we have yet to get effect sizes that approximate the original report.

I want to close by trying to clarify my position.  I am not saying that the effect sizes in question are zero or that this is an unimportant research area.  On the contrary, I think this is an incredibly important topic and thus it requires even greater attention to statistical power and precision.


Updated 26 Feb 2014: I corrected the sample size from study 1 from 204 to 240.

Warm Water and Loneliness

Our paper on bathing/showering habits and loneliness has been accepted (Donnellan, Lucas, & Cesario, in press).  The current package has 9 studies evaluating the correlation between trait loneliness and a preference for warm showers and baths as inspired by Studies 1a and 1b in Bargh and Shalev (2012; hereafter B & S).  In the end, we collected data from over 3,000 people and got effect size estimates that were considerably smaller than the original report.  Below are some random reflections on the results and the process. As I understand the next steps, B & S will have an opportunity to respond to our package (if they want) and then we have the option of writing a brief rejoinder.

1. I blogged about our inability to talk about original B & S data in the Fall of 2012.  I think this has been one of my most viewed blog entries (pathetic, I know).  My crew can apparently talk about these issues now so I will briefly outline a big concern.

Essentially, I thought the data from their Study 1a were strange. We learned that 46 of the 51 participants (90%) reported taking less than one shower or bath per week.  I can see that college students might report taking less than 1 bath per week, but showers?  The modal response in each of our 9 studies drawn from college students, internet panelists, and mTurk workers was always “once a day” and we never observed more than 1% of any sample telling us that they take less than one shower/bath per week.  So I think this distribution in the original Study 1a has to be considered unusual on both intuitive and empirical grounds.

The water temperature variable was also odd given that 24 out of 51 participants selected “cold” (47%) and 18 selected “lukewarm” (35%).   My own intuition is that people like warm to hot water when bathing/showering.  The modal response in each of our 9 samples was “very warm” and it was extremely rare to ever observe a “cold” response.

My view is that the data from Study 1a should be discarded from the literature. The distributions from 1a are just too weird.  This would then leave the field with Study 1b from the original B & S package based on 41 community members versus our 9 samples with over 3,000 people.

2.  My best meta-analytic estimate is that the correlation between trait loneliness and the water temperature variable is .026 (95% CI: -.018 to .069, p = .245).  This is based on a random effects model using the 11 studies in the local literature (i.e., our 9 studies plus Studies 1a and 1b – I included 1a to avoid controversy).  Researchers can debate about the magnitude of correlations but this one seems trivial to me especially because we are talking about two self-reported variables. We are not talking about aspirin and a life or death outcome or the impact of a subtle intervention designed to boost GPA.  Small effects can be important but sometimes very small correlations are practically and theoretically meaningless.

3. None of the original B and S studies had adequate power to detect something like the average .21 correlational effect size found across many social psychological studies (see Richard et al., 2003).  Researchers need around 175 participants with power set to .80 for the r = .21 expectation. If one takes sample size as an implicit statement about researcher expectations about the underlying effect sizes, it would seem like the original researchers thought the effects they were evaluating were fairly substantial.  Our work suggests that the effects in question are probably not.

In the end, I am glad this paper is going to see the light of day.  I am not sure all the effort was worth it but I hope our paper makes people think twice about the size of the connection between loneliness and warm showers/baths.

25 Jan 2014:  Corrected some typos.

One for the File Drawer?

I once read about an experiment in which college kids held either a cold pack or a warm pack and then reported about their levels of so-called trait loneliness. We just tried a close replication of this experiment involving the same short form loneliness scale used by the original authors. I won’t out my collaborators but I want to acknowledge their help.

The original effect size estimate was pretty substantial (d = .61, t = 2.12, df = 49) but we used 261 students so we could have more than adequate power. Our attempt yielded a much small effect size than the original (d =-.01, t = 0.111, df = 259, p = .912).  The mean of the cold group (2.10) was darn near the same as the warm group (2.11; pooled SD = .61).  (We also get null results if you restrict the analyses to just those who reported that they believed the entire cover story: d = -.17.  The direction is counter to predictions, however.)

Failures to replicate are a natural part of science so I am not going to make any bold claims in this post. I do want to point out that the reporting in the original is flawed. (The original authors used a no-pack control condition and found no evidence of a difference between the warm pack and the no-pack condition so we just focused on the warm versus cold comparison for our replication study).  The sample size was reported as 75 participants. The F value for the one-way ANOVA was reported as 3.80 and the degrees of freedom were reported as 2, 74.  The numerator for the reference F distribution should be k -1 (where k is the number of conditions) so the 2 was correct.  However, the denominator was reported as 74 when it should be N – k or 72 (75 – 3).   Things get even weirder when you try to figure out the sample sizes for the 3 groups based on the degrees of freedom reported for each of the three follow-up t-tests.

We found indications that holding a cold pack did do something to participants.  Both the original study and our replication involved a cover story about product evaluation. Participants answered three yes/no questions and these responses varied by condition.

Percentage answering “Yes” to the Pleasant Question:

Warm: 96%     Cold: 80%

Percentage answering “Yes” to the Effective Question:

Warm: 98%     Cold: 88%

Percentage answering “Yes” to the Recommending to a Friend Question:

Warm: 95%   Cold: 85%

Apparently, the cold packs were not evaluated as positively as the warm packs.  I can foresee all sorts of criticism coming our way. I bet one thread is that were are not “skilled” enough to get the effect to work and a second thread is that we are biased against the original authors (either explicitly or implicitly). I’ll just note these as potential limitations and call it good.  Fair enough?

Update 7 February 2014:  We decided to write this up for a journal article. In the process of preparing the manuscript and files for posting, Jessica noticed that I did not drop a participant with an ID we use for testing the survey system.  Thus, the actual sample size should be 260 NOT 261.  Fortunately, this did not change any of the conclusions.  The t statistic was -0.006 (df = 258), p = .995 and the effect size was d = -.01.  We also conducted a number of supplementary analyses to see if removing participants who expressed suspicion or had questionable values on the manipulation check variable (rating the temperature of the cold pack) impacted results.  Nothing we could do influenced the bottom line null result.

I caught my own mistake so I donated $20 to a charity I support – the American Cancer Society.

Politics and Marital Quality: Or How I Wasted My Morning

I had been wondering if political orientation or discrepancies in political orientation might be related to relationship quality. I think this is an interesting question in light of a close presidential election. Fortunately, I had access to some data on these variables from around 330 heterosexual married couples. I conducted some preliminary analyses this morning and the short story is a bunch of null findings.

Measures: Political orientation was measured on the “traditional seven-point scale” where 1=extremely liberal to 7 = extremely conservative (see Knight, 1999). Marital quality was measured using five items from the quality of marriage index (Norton, 1983).  The internal consistencies were typical of this measure (alphas ≥ .90 for wives and husbands)

Descriptive Results: Husbands were slightly more conservative than wives (Husband Mean = 4.63, Wife Mean = 4.33, Pooled SD = 1.36; d = .22). Husbands and wives did not differ in terms of marital quality (Husband Mean = 4.26, Wife Mean = 4.25, Pooled SD = .83, d = .01). There was evidence of spousal similarity for political orientation (ICC = .54) and marital quality (ICC = .62). None of the zero-order correlations involving political orientation and marital quality were impressive or statistically significant (largest r = -.05).

Actor Partner Interdependence Model (APIM) Results: I squared the difference between political orientation scores from husbands and wives and used that score in a very basic dyadic model.  I specified the APIM for interchangeable dyads with the exception of allowing for a mean-level difference in political orientation between wives and husbands.  None of the relevant effects were statistically different from zero:  Actor effect: .008 (SE = .023); Partner Effect: -.015 (SE = .023); Discrepancy Effect: -.013 (SE = .016). Thus, political orientation did not seem to matter for the individual’s report of marital quality or for her/his partner’s report of marital quality.  The discrepancy did not seem to matter either.

A weakness is the single-item measure of political orientation and the fact that these couples had been together for a period of time (Average age of husbands was around 37 years versus 35 years for wives).  Nonetheless, these initial results were not compelling to me.  Darn! It would have made an interesting story.  If anyone else has better data on this issue or more convincing results, let me know.