Alpha and Correlated Item Residuals

Subtitle: Is alpha lame?

My last post was kind of stupid as Sanjay tweeted. [Sanjay was actually much more diplomatic when he pointed out the circularity in my approach.] I selected the non-careless responders in a way that guarantees a more unidimensional result. A potentially better approach is to use a different set of scales to identify the non-careless responders and repeat the analyses. This flaw aside, I think my broader points still stand. It is useful to look for ways to screen existing datasets given the literature that: a) suggests careless responders are present in many datasets; and b) careless responders often distort substantive results (see the references and additional recommendations to the original post).

Another interesting criticism came about from my off-handed reporting of alpha coefficients. Matthew Hankins (via twitter) rightly pointed out that it is a mistake to compute alpha in light of the structural analyses I conducted. I favored a particular model for the structure of the RSE that specifies a large number of correlated item residuals between the negatively-keyed and positively-keyed items. In the presence of correlated residuals, alpha is either an underestimate or overestimate of reliability/internal consistency (see Raykov 2001 building on Zimmerman, 1972).

[Note: I knew reporting alpha was a technical mistake but I thought it was one of those minor methodological sins akin to dropping an f-bomb every now and then in real life.  Moreover, I am aware of the alpha criticism literature (and the alternatives like omega). I assumed the alpha is a lower bound heuristic when blogging but this is not true in the presence of correlated residuals (see again Raykov, 2001).]

Hankins illustrated issues with alpha and the GHQ-12 in a paper he recommended (Hankins, 2008). The upshot of his paper is that alpha often makes the GHQ-12 appear to be a more reliable instrument than other methods of computing reliability based on more appropriate factor structures (say like .90 versus .75).  Depending on how reliability estimates are used, this could be a big deal.

Accordingly, I modified some Mplus syntax using Brown (2015) and Raykov (2001) as a template to compute a more appropriate reliability estimate for the RSE for my preferred model.  Output that includes the syntax is here. [I did this quickly so I might have made a mistake!]  Using this approach, I estimated reliability for my sample of 1,000 to be .699 for my preferred model.  This is compared to the .887 estimate I got with alpha. If you want a way to contextualize this drop, you can think about how this difference would impact the Standard Error of Measurement when considering the precision of estimates for individual scores.  The SD for the mean scores was .724.

I go back and forth about whether I think alpha is lame or if all of the criticism of alpha is much ado about nothing. Today I am leaning towards the alpha is lame pole of my thinking.  Correlated residuals are a reality for the scales that I typically use in research. Yikes!

Thanks to people who tweeted and criticized my last post.

Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd edition).

Hankins (2008). The reliability of the twelve-item general health questionnaire (GHQ-12) under realistic assumptions.

Raykov (2001). Bias of coefficient α for fixed congeneric measures with correlated errors.

Zimmerman (1972). Test reliability and the Kuder-Richardson formulas: Derivation from probability theory.