The Life Goals of Kids These Days Part II

This is a follow-up to my January 16 blog post with some new data!  Some of my former students and now colleagues have launched a longitudinal study of college students. In the Fall of 2013 we gave a large sample of first year students the Monitoring the Future goal items.  I thought it would be fun to see what these data looked like and how these goals were correlated with certain measures of personality.  These data are from a school in the Southwest and are drawn from all incoming first-year students.

Students were asked about 14 goals and could answer on a 1 to 4 point scale (1=”Not Important” whereas 4=”Extremely Important”).  Descriptive data for the 14 goals in order of the average level of endorsement are reported below.  I also included the ranking for Millennials as reported in Arnett, Trzesniewski, and Donnellan (2013) and described in my older post.

Table 1: Goals for First Year Students (Unnamed School in the Southwest) using the Monitoring the Future Goal Items

Goal

Rank in MTF for Millennials

M

SD

% Reporting Extremely Important

Having a good marriage and family life

1

3.54

.80

69.7

Being successful in my line of work

5

3.54

.64

61.3

Having strong friendships

3

3.52

.68

61.6

Being able to find steady work

2

3.51

.65

58.3

Finding a purpose and meaning in my life

6

3.35

.84

55.0

Being able to give my children better opportunities than I’ve had

4

3.32

.87

53.8

Having plenty of time for recreation and hobbies

7

3.11

.81

36.7

Making a contribution to society

9

3.11

.87

39.4

Discovering new ways to experience things

10

2.89

.91

28.3

Having lots of money

8

2.67

.91

21.3

Living close to parents and relatives

11

2.50

1.03

21.2

Working to correct social and economic inequalities

13

2.41

.99

17.3

Being a leader in my community

12

2.35

1.01

17.0

Getting away from this area of the country

14

1.83

1.01

10.1

Note: N = 1,245 to 1,254

As before, marriage and friendships was seemingly highly valued as was being successful and finding steady work. So these first year college students want it all – success in love and work.  Damn these kids — who do they think they are?

I was then able to correlate the goal responses with measures of self-esteem, narcissism, and the Big Five. Below is a table showing the relevant correlations.

Table 2: Correlations between Goal Items and Measures of Self-Esteem, Narcissism, Extraversion, and Agreeableness

Goal

Self-Esteem

NPI Total

NPI-EE

PDQ-NPD

Extraversion

Agreeableness

Having a good marriage and family life

.17

.05

-.09

-.07

.17

.29

Being successful in my line of work

.18

.18

-.01

.04

.19

.19

Having strong friendships

.16

.08

-.08

-.05

.26

.25

Being able to find steady work

.15

.09

-.03

-.02

.14

.20

Finding a purpose and meaning in my life

.04

.10

-.03

.00

.17

.15

Being able to give my children better opportunities than I’ve had

.11

.11

-.06

.03

.20

.25

Having plenty of time for recreation and hobbies

.07

.18

.08

.09

.15

.07

Making a contribution to society

.14

.18

-.03

.02

.25

.20

Discovering new ways to experience things

.15

.26

.05

.11

.27

.12

Having lots of money

.08

.34

.26

.21

.18

.03

Living close to parents and relatives

.12

.11

.01

.04

.16

.24

Working to correct social and economic inequalities

.08

.19

.03

.05

.19

.14

Being a leader in my community

.13

.36

.12

.16

.35

.18

Getting away from this area of the country

-.09

.19

.18

.18

.04

-.13

Note: Correlations ≥ |.06| are statistically significant at p < .05.  Correlations  ≥ |.20| are bolded. Self-Esteem was measured with the Rosenberg (1989) scale. The NPI (Raskin & Terry, 1988) was used so we that could compute the NPI-EE (Entitlement/Exploitativeness) subscale (see Ackerman et al., 2011) and even the total score (yuck!). The PDQ-NPD column is the Narcissistic Personality Disorder subscale of the Personality Diagnostic Questionnaire-4 (Hyler, 1994).  Extraversion and Agreeableness were measured using the Big Five Inventory (John et al., 1991).

What do I make of these results?  On the face of it, I do not see a major cause for alarm or worry.  These college students seem to want it all and it will be fascinating to track the development of these goals over the course of their college careers.  I also think Table 2 provides some reason to caution against using goal change studies as evidence of increases in narcissism but I am probably biased.  However, I do not think there is compelling evidence that the most strongly endorsed goals are strongly positively related to measures of narcissism.  This is especially true when considering the NPI-EE and PDQ correlations.

Thanks to Drs. Robert Ackerman, Katherine Corker, and Edward Witt.

I don’t care about effect sizes — I only care about the direction of the results when I conduct my experiments

This claim (or some variant) has been invoked by a few researchers when they take a position on issues of replication and the general purpose of research.  For example, I have heard this platitude from some quarters when they were explaining why they are unconcerned when an original finding with a d of 1.2 reduces to a d of .12 upon exact replications. Someone recently asked me for advice on how to respond to someone making the above claim and I struggled a bit.  My first response was to dig up these two quotes and call it a day.

Cohen (1994): “Next, I have learned and taught that the primary product of research inquiry is one or more measures of effect size, not p values.” (p. 1310).

Abelson (1995): “However, as social scientists move gradually away from reliance on single studies and obsession with null hypothesis testing, effect size measures will become more and more popular” (p. 47).

But I decided to try a bit harder so here are my random thoughts at trying to respond to the above claim.

1.  Assume this person is making a claim about the utility of NHST. 

One retort is to ask how the researcher judges the outcome of their experiments.  They need a method to distinguish the “chance” directional hit from the “real” directional hit.  Often the preferred tool is NHST such that the researcher will judge that their experiment produced evidence consistent with their theory (or it failed to refute their theory) if the direction of the difference/association was consistent with their prediction and the p value was statistically significant at some level (say an alpha of .05).  Unfortunately, the beloved p-value is determined, in part, by the effect size.

To quote from Rosenthal and Rosnow (2008, p. 55):

Because a complete account of “the results of a study” requires that the researcher report not just the p value but also the effect size, it is important to understand the relationship between these two quantities.  The general relationship…is…Significance test = Size of effect * Size of study.

So if you care about the p value, you should care (at least somewhat) about the effect size.  Why? The researcher gets to pick the size of the study so the critical unknown variable is the effect size.  It is well known that given a large enough N, any trivial difference or non-zero correlation will attain significance (see Cohen, 1994, p. 1000 under the heading “The Nil Hypothesis”). Cohen notes that this point was understood as far back as 1938.  Social psychologists can look to Abelson (1995) for a discussion of this point as well (see p. 40).

To further understand the inherent limitations of this NHST-bound approach, we can (and should) quote from the book of Paul Meehl (Chapter 1978).

Putting it crudely, if you have enough cases and your measures are not totally unreliable, the null hypothesis will always be falsified, regardless of the truth of the substantive theory. Of course, it could be falsified in the wrong direction, which means that as the power improves, the probability of a corroborative results approaches one-half. However, if the theory has no verisimilitude – such that we can imagine, so to speak, picking our empirical results randomly out of a directional hat apart from any theory – the probability of a refuting by getting a significant difference in the wrong direction also approaches one-half.  Obviously, this is quite unlike the situation desired from either a Bayesian, a Popperian, or a commonsense scientific standpoint.”  (Meehl, 1978, p. 822).

Meehl gets even more pointed (p. 823):

I am not a statistician, and I am not making a statistical complaint. I am making a philosophical complaint or, if you prefer, a complaint in the domain of scientific method. I suggest that when a reviewer tries to “make theoretical sense” out of such a table of favorable and adverse significance test results, what the reviewer is actually engaged in, willy-nilly or unwittingly, is meaningless substantive constructions on the properties of the statistical power function, and almost nothing else.

Thus, I am not sure that this appeal to directionality with the binary outcome from NHST (i.e., a statistically significant versus not statistically significant result according to some arbitrary alpha criterion) helps make the above argument persuasive.  Ultimately, I believe researchers should think about how strongly the results of a study corroborate a particular theoretical idea.  I think effect sizes are more useful for this purpose than the p-value.  You have to use something – why not use the most direct indicator of magnitude?

A somewhat more informed researcher might tell us to go read Wainer (1999) as a way to defend the virtues of NHST.  This paper is called “One Cheer for Null Hypothesis Significance Testing” and appeared in Psychological Methods in 1999.  Wainer suggests 6 cases in which a binary decision would be valuable.  His example from psychology is testing the hypothesis that the mean human intelligence score at time t is different from the mean score at time t+1.

However, Wainer also seems to find merit in effect sizes.  He writes this as well “Once again, it would be more valuable to estimate the direction and rate of change, but just being able to state that intelligence is changing would be an important contribution (p. 213). He also concludes that “Scientific investigations only rarely must end with a simple reject-not reject decision, although they often include such decisions as part of their beginnings” (p. 213).  So in the end, I am not sure that any appeal to NHST over effect size estimation and interpretation works very well.  Relying exclusively on NHST seems way worse than relying on effect sizes.

2.  Assume this person is making a claim about the limited value of generalizing results from a controlled lab study to the real world.

One advantage of the lab is the ability to generate a strong experimental manipulation.  The downside is that any effect size estimate from such a study may not represent typical world dynamics and thus risks misleading uninformed (or unthinking) readers.  For example, if we wanted to test the idea that drinking regular soda makes rats fat, we could give half of our rats the equivalent of 20 cans of coke a day whereas the other half could get 20 cans of diet coke per day.  Let’s say we did this experiment and the difference was statistically significant (p < .0001) and we get a d = 2.0.  The coke exposed rats were heavier than the diet coke exposed rats.

What would the effect size mean?  Drawing attention to what seems like a huge effect might be misleading because most rats do not drink 20 cans of coke a day.  The effect size would presumably fluctuate with a weaker or stronger manipulation.  We might get ridiculed by the soda lobby if we did not exercise caution in portraying the finding to the media.

This scenario raises an important point about the interpretation of the effect sizes but I am not sure it negates the need to calculate and consider effect sizes.  The effect size from any study should be viewed as an estimate of a population value and thus one should think carefully about defining the population value.  Furthermore, the rat obesity expert presumably knows about other effect sizes in the literature and can therefore place this new result in context for readers.  What effect sizes do we see when we compare sedentary rats to those who run 2 miles per day?  What effect sizes do we see when we compare genetically modified “fat” rats to “skinny” rats?  That kind of information helps the researcher interpret both the theoretical and practical importance of the coke findings.

What Else?

There are probably other ways of being more charitable to the focal argument. Unfortunately, I need to work on some other things and think harder about this issue. I am interested to see if this post generates comments.  However, I should say that I am skeptical that there is much to admire about this perspective on research.  I have yet to read a study where I wished the authors omitted the effect size estimate.

Effect sizes matter for at least two other reasons beyond interpreting results.  First, we need to think about effect sizes when we plan our studies.  Otherwise, we are just being stupid and wasteful.  Indeed, it is wasteful and even potentially unethical to expend resources conducting underpowered studies (see Rosenthal, 1994).  Second, we need to evaluate effect sizes when reviewing the literature and conducting meta-analyses.  We synthesize effect sizes, not p values.  Thus, effect sizes matter for planning studies, interpreting studies, and making sense of an overall literature.

[Snarky aside, skip if you are sensitive]

I will close with a snarky observation that I hope does not detract from my post. Some of the people making the above argument about effect sizes get testy about the low power of failed replication studies of their own findings.   I could fail to replicate hundreds (or more) important effects in the literature by running a bunch of 20 person studies. This should surprise no one. However, a concern about power only makes sense in the context of an underlying population effect size.  I just don’t see how you can complain about the power of failed replications and dismiss effect sizes.

Post Script (6 August 2013):

Daniel Simons has written several good pieces on this topic.  These influenced my thinking and I should have linked to them.  Here they are:

http://blog.dansimons.com/2013/03/what-effect-size-would-you-expect.html

http://blog.dansimons.com/2013/03/a-further-thought-experiment-on.html

Likewise, David Funder talked about similar issues (see also the comments):

http://funderstorms.wordpress.com/2013/02/01/does-effect-size-matter/

http://funderstorms.wordpress.com/2013/02/09/how-high-is-the-sky-well-higher-than-the-ground/

And of course, Lee Jussim (via Brent Roberts)…

http://pigee.wordpress.com/2013/02/23/when-effect-sizes-matter-the-internal-incoherence-of-much-of-social-psychology/