Determining Number of Degrees of Freedom

(Updated March 4, 2016)

Degrees of freedom in SEM reflect the complexity vs. parsimony of a model. The greater number of paths you estimate, the lower the df. Determination of df also depends on the number of manifest (or measured) indicator variables included in your model.

The number of known elements in your input variance/covariance matrix (not including means) can be determined from the following equation, where "I" is the number of manifest indicators:

Also, in your AMOS printouts, the number of freely estimated parameters can be observed by how many parameters have significance tests (i.e., estimates, critical ratios, and p levels).


The distinctions between construct residual variances and construct variances, and between indicator residual variances and indicator variances, can be confusing. Here's a little more explanation.

Recall that anything (construct or indicator) that has an incoming unidirectional arrow from something else in the model gets a residual variance ("bubble"). In the first example in the following photo, a latent construct (large oval) has an incoming direct arrow (shown at left) from some hypothetical predictor variable. Let's say the predictor accounts for 45% of the variance in the shown construct (like an R-squared in regression). The residual (unaccounted for) variance in the bubble would thus be 55%. Because the variance accounted for (R-squared) and unaccounted for (residual) in a dependent measure must sum to 100%, the R-squared and residual variances are redundant. If you know one is 45%, the other must be 55%, and vice-versa. There is thus no need to include both variances in the model. By SEM convention, the variance in such a situation is "housed" in the residual bubble (indicated by an asterisk * in the photo), which is called a "construct residual variance."

Similar reasoning holds in the third pictured scenario. Each manifest indicator (rectangle) has variance accounted for by the construct, as well as residual variance. Again, each indicator's variance in this scenario is housed in the residual bubble (see asterisks), and is known as an "indicator residual variance."

Either a construct (second example) or stand-alone indicator variable (fourth example) may have no incoming unidirectional arrows, and only outgoing unidirectional arrows. In these situations, lacking a residual bubble, the variance is housed in either the construct or indicator itself.

SEM The Musical 1

The debut of "SEM The Musical" was held on April 27 and we ended up with 19 songs (lyrics below). Some video clips are also available below, shown by their respective songs and lyrics (thanks to Sothy Eng and Xiaozhi "Gigi" Zhou for their videography work). Derek Ross, an obviously talented video editor and husband of one of the SEM students, condensed our musical into a five-minute documentary, which is more like a spoof infomercial. Click here for the documentary/spoof infomercial. It's truly "must-see TV." Just to be clear: We are not selling videos!

SEM The Musical
By Dr. Alan Reifman and his Spring 2007 Quantitative Methods IV class

(Back-up vocals in parentheses)

Welcome to SEM The Musical
Lyrics by Alan Reifman
(May be sung to the tune of "Matchmaker," Bock/Harnick, from Fiddler on the Roof)

SEM, SEM, it can be sung,
You’ll be amazed, at what we’ve sprung,
We hope you’ll learn more ’bout this stats technique,
Through songs of which you’re among,

SEM, SEM, we like to run,
It takes awhile, but we get it done,
We hope you’ll learn of the steps that we take,
And take home from this, some fun…

I Am an Indicator
Lyrics by Alan Reifman
(May be sung to the tune of "The Entertainer," Billy Joel)

I am an indicator, a latent construct I represent,
I'm measurable, sometimes pleasurable,
A manifestation of what is meant,

I am an indicator, I usually come in a multiple set,
With other signs of the same construct, you may instruct,
I'm correlated with my co-indicators, you can bet,

I am an indicator, from my presence the construct is inferred,
I'm tap-able, the construct is not palpable,
The distinction should not be blurred

At Least Three
Lyrics by Alan Reifman
(May be sung to the tune of "Think of Me," Lloyd Webber/Hart/Stilgoe, from Phantom of the Opera)

(Cat Pause, lead vocals)

At least three, indicators are urged,
For each latent construct shown,
At least three, indicators should help,
Avoid output where you groan,

With less than three, your construct sure will be, locally unidentified,
Though the model might still run, you could have a rough ride

Gotta Fix It to 1
Lyrics by Alan Reifman
(May be sung to the tune of "Fortunate Son," John Fogerty)

You make a construct, with its loadings,
Can’t let them, all be free,
So that the model’s identified,
Fixing one is the key,

It ain’t free,
It ain’t free,
Gotta fix it to 1,

It ain’t free,
It ain’t free,
In AMOS, automatically done

The number of knowns in your model,
The unknowns can’t exceed,
Fixing a loading for each construct,
Will accomplish this need,

It ain’t free,
It ain’t free,
Gotta fix it to 1,

It ain’t free,
It ain’t free,
In AMOS, automatically done

Residual Variance
Lyrics by Alan Reifman
(May be sung to the tune of "I Say a Little Prayer," Bacharach/David)

Residual variance,
What variables do not share, hence,
I draw a little shape for you,

Residual variance,
What’s left after the R-square, hence,
I draw a little shape for you,

Small circles, to show the, unexplained variance,
...we will always use,
We’ll see what is left in the indicators,
And endogenous,
Constructs that we predict to...

Constrain, ’strain, ’strain...
Lyrics by Alan Reifman
(May be sung to the tune of "Chain of Fools," Don Covay, popularized by Aretha Franklin)

(Cat Pause, lead vocals)

Constrain, ’strain, ’strain (Constrain, ’strain, ’strain),
Constraints are tools (Constraints are tools),
Constrain, ’strain, ’strain (Constrain, ’strain, ’strain),
Constraints are tools (Constraints are tools),

You want to test, if two paths are equal,
You run the model once, then you run a sequel,

First, let the paths run free, they take on their own values,
A chi-square you will see, but what does it tell you?

You must...

Constrain, ’strain, ’strain (Constrain, ’strain, ’strain),
Constraints are tools (Constraints are tools),
Constrain, ’strain, ’strain (Constrain, ’strain, ’strain),
Constraints are tools (Constraints are tools),

You re-run your model, with paths fixed to be the same,
You get a new chi-square, higher than what before came,

You compare the two models, via the delta chi-square test,
If it’s significant, then the free version is best,

When you’ve...

Constrained, ’strained, ’strained (Constrained, ’strained, ’strained),
Constraints are tools (Constraints are tools),
Constrained, ’strained, ’strained (Constrained, ’strained, ’strained),
Constraints are tools (Constraints are tools)...

If it's Nested
Lyrics by Alan Reifman and Adam Munk
(May be sung to the tune of "Mandy," English/Kerr, popularized by Barry Manilow)

If you want to check and see,
If a path is necessary,
What you should do,
Is run a nested model,
Here's the steps to take,
You don't want to dawdle...

If it's nested,
You must only add paths without taking,
Or only take away paths without adding,

If it's nested,
You can compare chi-squares of the models,
And you'll see if the new path is worth adding...

Lyrics by Alan Reifman (expanded for 2010, video)
(May be sung to the tune of "Mony Mony," Bloom/Gentry/James/Cordell)

NOTE: Parsi-Mony has come to be performed as our closing number every year. Below, Dr. Reifman chats with Tommy James, whose classic hit Mony Mony inspired Parsi-Mony. James performed at the 2013 South Plains Fair and was kind enough to stick around and visit with fans and sign autographs. Dr. Reifman tells Tommy about how he (Dr. Reifman) has written statistical lyrics to Tommy's songs for teaching purposes.  

Structural models need parsimony,
Don’t want to add paths that are phony,
Put the paths you need, now that’s all right, yeah,
You got to keep your model lean and tight, now,
...lean and tight now,
I said, yeah (audience joins), yeah, yeah, yeah, yeah,…

If you can account (PARSIMONY),
The data (PARSIMONY),
Minimum of paths (PARSIMONY),
You’ve got (PARSIMONY)
Baby don't stop, seeking (PARSIMONY),
Hey, yeah, yeah, yeah, yeah, yeah, yeah,

Get up!
(brief break)

Few paths, sparse graphs, parsimony,
Above all, keep it small, parsimony,
You want to keep your model looking slim, yeah,
Don't stop now, seek out parsimony, seek parsimony!

Yeah, yeah, yeah…

If you can account (PARSIMONY),
The data (PARSIMONY),
Minimum of paths (PARSIMONY),
You’ve got (PARSIMONY)
Baby don't stop, seeking (PARSIMONY),
Hey, yeah, yeah, yeah, yeah, yeah, yeah,

[interlude -- introduce our "band," thank-you's, etc.]

You want parsimony ...mony (audience repeats)
Parsimony ...mony (audience repeats)
Parsimony ...mony...(audience repeats)

Lyrics by Alan Reifman (May be sung to the tune of "Aquarius," Rado/Ragni/MacDermot, from Hair, also popularized by the Fifth Dimension)

You draw paths to show relationships,
You hope align with the known r’s,
Your model will guide the tracings,
From constructs near to constructs far,

You will compare this with the data’s covariance,
The data’s covariance...

Similar to correlation,
With the variables unstandardized,
Does each known covariance match up with,
The one the model tracings will derive?


You’ve Got to Check Your R-M-S-E-A
Lyrics by Alan Reifman
(May be sung to the tune of "YMCA," Belolo/Morali/Willis, popularized by the Village People)

How well, does your model match up,
To the matrix of covariances? Yup,

I said, How well, can you reproduce the,
Structure... of the... variables... you see?

You’ve got to check your R-M-S-E-A,
You’ve got to check your R-M-S-E-A,
You want your value, to be very small,
Preferably below, .05 will it fall,

You’ve got to check your R-M-S-E-A,
You’ve got to check your R-M-S-E-A,
It’s one of the, best fit indices,
You can check it, with any others you please...

Check Your NFI
Lyrics by Alan Reifman
(May be sung to the tune of "Judy’s Turn to Cry," Lewis/Ross, popularized by Lesley Gore)

You’ve got to check your NFI,
...check your TLI,
...check your CFI,
’Cause none of them alone’s a hit...

You’ve just finished running your model,
And you want to know its goodness of fit,
But there’s no one single index,
That scholars consider a perfect hit,

You’ve got to check your NFI,
...check your TLI,
...check your CFI,
’Cause none of them alone’s a hit...

The standard advice is to look at,
A variety of measures of fit,
So you pick out a set of several,
Of indices, you form your own kit,

You’ve got to check your NFI,
...check your TLI,
...check your CFI,
’Cause none of them alone’s a hit...


Stand by Your Model
Lyrics by Alan Reifman
(May be sung to the tune of "Stand by Your Man," Wynette/Sherrill)

When there’s a path,
That comes out non-significant,
What should you do?
Should you eliminate this path?

Stand by your model,
It represents your best thinking,
Stand by your model,
Don’t want one that’s shrinking,

Just because you,
Didn’t find a certain result,
Keep the model intact,
A future study may support it,

Stand by your model,
It represents your best thinking,
Stand by your model,
Don’t want one that’s shrinking,

Ready to Run
Lyrics by Alan Reifman
(May be sung to the tune of "Ready to Run," Seidel/Hummon, popularized by the Dixie Chicks)

I’ve drawn my shapes and my arrows,
I’m gonna be ready this time (ready this time),
I’ve requested a standardized solution,
I’m gonna be ready this time (ready this time),

Ready, ready, ready, ready, ready, ready to run,
Error messages, I hope to see none...,
Will my assignment get done?

Gonna view my text output,
I hope that it’s correct this time (correct this time),
Got a chi-square and acceptable values,
It looks like it’s correct this time (correct this time),

Ready, ready, ready, ready, ready, ready to run,
Looking for error messages, I see none,
Running a model is fun...

If I Had Multiple Groups
Lyrics by Alan Reifman
(May be sung to the tune of “If I Had a Hammer,” Hays/Seeger)

If I had multiple groups,
I’d run them in the morning,
I’d run them in the evening,
All over this land,

I’d use cross-group constraints,
On the loadings and the structural paths,
I’d want to see if, the chi-square was so different,
From an unconstrained hand

You Had a Bad Fit
Lyrics by Lukas Dean
(May be sung to the tune of "Bad Day," Daniel Powter)

Where is the department statistician when needed the most?
Your research and theory kick up a model that's great,
You draw it in AMOS in a way that relates,
The little wand helps makes the structural paths straight,
You link it to data, now you're out of the gate,

You stand up in the lab to see how the results go,
You fake up a smile when you forgot to estimate means, oh no!

The output says your model is way off line,
Your theory's falling to pieces this time,
And I don't have no Heywood Case,

You had a bad fit,
You take some items down,
You correlate errors, just to turn it around,

You say you don't know, the numbers don't lie,
Your RMSEA was way too high,
You had a bad fit,
The numbers don't lie,
Kenny says it shouldn't be higher than .05,
You had a bad fit, you had a bad fit

An SEM Miracle
Lyrics by Alan Reifman (expanded for 2010)
(May be sung to the tune of "It’s a Miracle," Manilow/Panzer)

I ran this model hours on end,
And kept having problems with it,
There was negative variance,
Known as a Heywood Case,

Paths were unidentified,
Despite everything that I tried,
I did what, I thought would work,
The problem, I couldn't trace,

It’s a miracle (miracle),
All errors have gone away,
The model finally runs,

It was looking hazy, I was going crazy,
Till the output page came through,
It looks clear, and my fit will astound you,
So maybe, I no longer face defeat,

For the miracle (miracle),
I can start writing now,
My homework’s almost done,

I’m finally starting, now, to see,
Where I may have, really, gone astray,
I may have been missing,
A "1" for a fixed pathway,

You've got to use, the AMOS tool,
Or you're gonna look like a fool,
But now that I've done it right,
The errors just go away,

It’s a miracle (miracle),
All errors have gone away,
The model finally runs,

It was looking hazy, I was going crazy,
Till the output page came through,
It looks clear, and my fit will astound you,
And baby, I'll be dancing in the street!

Your Model’s Only One
Lyrics by Alan Reifman
(May be sung to the tune of "The Old Man Down the Road," John Fogerty)

You need a good conceptual model,
You need a nice, large sample size,
You need multiple indicators for,
Each latent construct you surmise,

Plus, you must realize,
That your model’s only one,
Of the many equal-fitting,
Models... that could be run,

Your model represents a best guess,
Causality you cannot show,
You may get some good ideas,
For an experimental way to go,

Thus, you must realize,
That your model’s only one,
You should probably look at,
The writings... of MacCallum

I Guess It Never Hurts to Winsorize
Lyrics by Kristina Keyton
(May be sung to the tune of "I Guess It Never Hurts to Hurt Sometimes," Randy VanWarmer, popularized by the Oak Ridge Boys)

Sometimes I feel the weight,
Of an outlier in my model,
It caused a Heywood case,
And it makes me want to cry,
Is there nothing we can do,
To fix this data problem,
But a memory,
Of Reifman's class saved me,

Outliers always hurt the mean,
And that's ruining my model,
But I won't give up on it,
Just because of one number,
Sometimes it makes me sad,
That we can't just say goodbye,
But I guess it never hurts to Winsorize,

We try and hold on to our moments,
But outliers can't stay,
But we can't just delete,
We lose information that way,
We can't look forward to our output,
And still hold onto bad data,
Oh I hope that you will hear me,
When I say...

Outliers always hurt the mean,
And that's ruining my model,
But I won't give up on it,
Just because of one number,
Sometimes it makes me sad,
That we can't just say goodbye,
But I guess it never hurts to Winsorize

Lyrics by Alan Reifman, dedicated to Peter Westfall (article of his)
(May be sung to the tune of "Galveston," Jimmy Webb, popularized by Glen Campbell)

Ultimately, SEM,
Your LV’s cannot be measured,
Which gives the critics some displeasure,
There’s nothing physical to grab on,
When you run SEM,

You make many an assumption,
Is it recklessness or gumption?
Assume the e’s uncorrelated...
When you run SEM,

I can see the critics’ point of view, now,
They’re saying the models aren’t unique,

That, we must willingly acknowledge,
In response to the critique, if we want to keep on using...

SEM, Oh, SEM...

Longitudinal/Panel SEM

(Updated April 8, 2015)

Our next topic is longitudinal SEM, actually a particular type of longitudinal design called a panel study, where the same respondents are followed up over time (longitudinal panel studies should not be confused with online/consumer panels). An example of a longitudinal panel study is the University of Michigan's Panel Study of Income Dynamics. Within the longitudinal panel design, we will learn about autoregressive and cross-lagged paths. Equality constraints will play a major role here.

One of the major purposes of longitudinal panel studies is to get a good approximation of causality. Short of actual experimentation, a longitudinal panel study is probably as good a design as there is for inferring causation. A couple of lecture modules from my methods course (here and here) may be helpful, along with a 2009 article from Child Development.

The following article by Albert Farrell should also be helpful. We will go over sections of it in class.

Farrell, A.D. (1994). Structural equation modeling with longitudinal data: Strategies for examining group differences and reciprocal relationships. Journal of Consulting and Clinical Psychology, 62, 477-487.

The article actually covers both longitudinal-panel models and multiple-group models. The two are separate topics; a study can have one of these aspects and not the other. We'll also use the Farrell article to discuss multiple-group modeling, but later on.

This PowerPoint slideshow by Patrick Sturgis is also helpful.

UPDATE (April 13, 2011): I've made some new graphics to illustrate two modeling conventions associated with panel SEM.

The correlated residuals are sometimes known as the "fountain effect" for their visual appearance. The fountain at Las Vegas's Bellagio Hotel nicely illustrates this, as seen below (from

UPDATE (March 18, 2010): Cameron McIntosh sent a list of references on longitudinal/panel analysis to the SEMNET listserv discussion group. The list, which I've lightly edited, may be helpful for students seeking to pursue the topic in greater detail.

Little, T.D., Preacher, K.J., Selig, J.P., & Card, N.A. (2007). New developments in latent variable panel analyses of longitudinal data. International Journal of Behavioral Development, 31, 357-365. [Copy available on Dr. Preacher's publications page; see heading "Longitudinal factorial invariance."]

Collins, L.M. (2006). Analysis of longitudinal data: The integration of theoretical model, temporal design, and statistical model. Annual Review of Psychology, 57, 505-528.

Phillips, J.A., & Greenberg, D.F. (2007). A comparison of methods for analyzing criminological panel data. Journal of Quantitative Criminology, 24, 51-72.

Preacher, K.J., Wichman, A.L., MacCallum, R.C., & Briggs, N.E. (2008). Latent growth curve modeling (part of the series Quantitative Applications in the Social Sciences, vol. 157). Thousand Oaks, CA: Sage.

Bollen, K.A., & Brand, J.E. (2008). Fixed and random effects in panel data using structural equation models. Los Angeles, CA: California Center for Population Research, UCLA (online).

Wu, A.D., Liu, Y., Gadermann, A.M., & Zumbo, B.D. (2009). Multiple-indicator multilevel growth model: A solution to multiple methodological challenges in longitudinal studies. Social Indicators Research (published online).

More advanced:

Curran, P.J., & Bollen, K.A. (2001). The best of both worlds: Combining autoregressive and latent curve models. In L.M. Collins & A.G. Sayer (Eds.), New methods for the analysis of change (pp. 105-136). Washington, DC: American Psychological Association.

Bollen, K.A., & Curran, P.J. (2004). Autoregressive latent trajectory (ALT) models: A synthesis of two traditions. Sociological Methods and Research, 32, 336-383.

Delsing, M.J.M.H., & Oud, J.H.L. (2008). Analyzing reciprocal relationships by means of the continuous-time autoregressive latent trajectory model. Statistica Neerlandica, 62, 58-82.

Oud, J.H.L. (2002). Continuous time modeling of the cross-lagged panel design. Kwantitatieve Methoden 69, 1-26.

Hamaker, E.L. (2005). Conditions for the equivalence of the autoregressive latent trajectory model and a latent growth curve model with autoregressive disturbances. Sociological Methods and Research, 33, 404-416.

Voelkle, M. C. (2008). Reconsidering the use of autoregressive latent trajectory (ALT) models. Multivariate Behavioral Research, 43,564-591.

Comparative Model Testing and Nested Models

As we've discussed, part of Assignment 2 requires you to engage in comparative model testing. Specifically, you will run your model both with and without directed paths from three university properties (public/private status, years of existence, and endowment [square-root transformed]) to their Undergraduate Quality (UQ).

The more parsimonious model is, of course, the one without the additional paths. To override the preference for parsimony, therefore, you will have to show that the additional paths, as a set, significantly reduce the overall model chi-square, thus improving model fit. As you move along in your careers, you may wish to adopt additional criteria, such as whether the reduction in chi-square appears substantively large in addition to being statistically significant, but for now, we'll use statistically significant change as our criterion.

You can display your results in a table, as follows:




Model w/ fewer parameters.....----............---...

Model w/ added parameters.....----............---...

Delta (change)..........................----............---...

The chi-square change score (the top chi-square minus the bottom one) can be treated like any other chi-square value and be referred to a chi-square table, with degrees of freedom equal to delta df (top df minus bottom df).

UPDATE, March 7, 2017: For the version of the universities model without the three paths listed above, the chi-square is 210.98 (31 df), whereas for the model that adds the three paths, chi-square = 169.04 (28 df).

UPDATE, March 11, 2012: Xiaohui photographed the explanation I diagrammed on the board, linking number of paths in a model, goodness of fit, chi-square, and degrees of freedom. A key point was to demonstrate that if one model has a higher chi-square than another model, it will also have a higher number of degrees of freedom. All of the green phrases go together: a model with fewer paths (which preserves a higher df) will have a poorer fit and thus a higher chi-square. The red terms represent the opposite of the green terms, and thus the red terms go together, as well: a model with more paths (which depletes the df) will lead to a better fit and thus a lower chi-square.

UPDATE, March 5, 2008: Kristina photographed the decision-tree I drew on the board, to augment our discussion of comparative model testing. Here it is (you can click on the image to enlarge it).

And now, back to our regular programming...

An important condition for being able to conduct comparative model tests is that the two models being compared to each other must possess the property of nestedness. Two models are nested if they can be converted from one to the other either by only adding parameters to one to obtain the other, or only removing parameters from one to obtain the other. By parameters, we mean anything that is freely estimated in SEM (e.g., structural paths, non-directional correlations). If you start with one model and convert it to a new, second model by both adding and substracting parameters from the initial model, the two models will not fulfill the criteria for nestedness and thus cannot be compared via the delta chi-square test.

The following two diagrams provide examples of nested and non-nested models.

An analogous situation exists in multiple regression. You can do a delta R-square test to see, for example, if a model with predictor set A, B, C, D, and E accounts for significantly more variance in the dependent variable than does predictor set A, B, and C. ABC is contained -- that is nested -- within ABCDE, thus permitting the statistical comparison. You could not, however, test whether predictor set ABCDF accounts for more variance than set ABCDE, because the change in models would have required both dropping a predictor and adding one. If ABCDE was the starting point, we would have dropped E and added F.

We'll use the following article to delve more deeply into comparative model testing:

Bryant, A. L., Schulenberg, J., Bachman, J. G., O'Malley, P. M., & Johnston, L. D. (2000). Understanding the links among school misbehavior, academic achievement, and cigarette use: A national panel study of adolescents. Prevention Science, 1, 71-87.

Negative Variances (Heywood Cases)

A problem specific to SEM (and factor-analytic models more generally), is that of negative residual variances. Variances, being squared entities (of a standard deviation), must be positive. Negative variances are known as "Heywood Cases." This webpage describes what a Heywood Case is and suggests a simple remedy. Additional discussion of Heywood Cases is available here.

Overall Model Fit

This posting examines several measures of overall model fit in greater depth than before. David Kenny's web-based summary of fit indices (links section on the right) is an excellent starting point (G.D. Garson's SEM page also has good information). As we discussed before, indices that have the word "fit" in their names should have high values (close to 1.0, which is the maximum for most of the fit indices) to signify a well-fitting model, whereas indices that have the words "error" or "residual" in their names should be small for a well-fitting model, the closer to 0.0 the better.

The AMOS output will report results for three models: the model you designed (also known as the default or proposed model); the independence (or null) model, which says that each measured variable is correlated exactly 0.0 with each other measured variable (with no latent constructs) and thus usually produces results indicative of poor fit with the data; and finally, the saturated model, which, as we've discussed, uses the maximum available parameters and thus is guaranteed to provide a perfect fit. As shown on Kenny's page, many of the fit indices involve comparisons of your model and the independence model.

Researchers typically will report three or four fit indices in an article or other scientific document. AMOS provides many more fit indices than that, so you'll need to form an opinion as to which ones you feel are the best.

St. Mary's University (Texas) professor Andrea Berndt performed a valuable service for the SEM community in her dissertation research at Old Dominion University. I received a copy of her study at the 1998 American Psychological Association convention and, to my knowledge, it has never been published (from the TTU library site, you can search in Dissertation Abstracts for BERNDT, ANDREA ELIZABETH, with her dissertation also available via Interlibrary Loan).

The background behind Berndt's research is that, in determining which are the optimal fit indices to use, we want ones that are not biased by study features, such as sample size. Fit indices should convey only the goodness of fit (or match) between the known, input correlation (covariance) matrix and the matrix derived from the SEM based on the tracing of paths. If a fit index provides a high or low value in good part just because of sample size or other study features, then it's probably misrepresenting the intrinsic fit of the model.

What Berndt did was search all issues of the journals Educational and Psychological Measurement, Journal of Applied Psychology, Journal of Personality and Social Psychology, and Structural Equation Modeling published between 1986-1996, extracting information on study design features and fit indices from all articles using SEM. Study features coded (as labeled by Berndt) were: sample size, number of indicators per latent variable, number of latent variables, number of estimated paths, and degrees of freedom. Fit indices recorded (if published in a given article or retrievable via other statistical information reported) were: Chi-Square, Comparative Fit Index, Critical Number, Goodness of Fit Index, Normed Fit Index, Non-Normed Fit Index (aka Tucker-Lewis Index), Root Mean Square Error of Approximation, and the Relative Noncentrality Index.

Berndt formed a dataset, with each line representing a study. From this set-up, she could conduct multiple-regression analyses, with each fit index serving as the dependent variable in a given analysis, and the study features serving as predictor variables. Again, because we want the fit indices to be free of any "contamination" from study features, a promising fit index will have the set of predictors produce a very small R-square (as close to 0.0 as possible). Here are the findings:

Chi-Square. R-square = .855 (degrees of freedom, Beta = .864, and sample size, Beta = .278, showed significant relations to Chi-Square size).

CFI. R-square = .084 (degrees of freedom, Beta = -.241, was significantly related to CFI).

CN. R-square = .038 (no study features significantly related to CN).

GFI. R-square = .253 (indicators per LV, Beta = -.283, and sample size, Beta = .174, significantly related to GFI).

NFI. R-square = .150 (df, Beta = -.288, and sample size, Beta = .176, significantly related to NFI).

NNFI. R-square = .027 (no study features significantly related to NNFI).

RMSEA. R-square = .038 (no study features significantly related to RMSEA).

RNI. R-square = .061 (df, Beta = -.238, significantly related to RNI).

I think it's clear which indicators have the smallest R-square values, and thus which ones would be recommended for you to use. If you would like to cite the Berndt paper in your writings to justify your choice of fit indices, the reference is:

Berndt, A.E. (1998, August). "Typical" model features and their effects on goodness-of-fit indices. Presented at the 106th Annual Convention of the American Psychological Association, San Francisco, CA.

UPDATE Summer 2014: Rebecca Oldham, a student in the Spring 2014 class, created the following table to summarize Berndt's findings:

UPDATE 4/27/2013: I found an article by Waterman et al. (2010) that, to my mind, provides good advice on how to present results regarding chi-square in an SEM manuscript:

The chi-square statistic is reported but is not used in interpretation, because it tests the null hypothesis of perfect fit to the data, which is implausible and almost certain to be rejected in models with large samples (p. 52).

Waterman, A. S., Schwartz, S. J., Zamboanga, B. L., Ravert, R. D., Williams, M. K., Agocha, B., Kim, S. Y., & Donnellan, M. B. (2010). The Questionnaire for Eudaimonic Well-Being: Psychometric properties, demographic comparisons, and evidence of validity. Journal of Positive Psychology, 6, 41-61.

Technical Aspects of Drawing AMOS Models

Now that we're beginning to learn how to draw models in AMOS (which, by the way, stands for Analysis of Moment Structures), I thought I'd list some of the technical aspects you'll see in the program. Most of the time, AMOS implements these technical aspects automatically, but it's important you know what is going on. [Update, Feb. 12: I've just added a diagram below to help illustrate the following principles; you can enlarge the diagram by clicking directly on it.]

1. Every manifest indicator (box) or latent construct (big circle) that has an incoming unidirectional "causal" arrow gets a residual (or error) term (small circle).

2. Every manifest indicator and latent construct (like any ordinary variable) gets a variance. If the indicator or construct has no incoming unidirectional arrow, its variance is located in the indicator or construct itself. However, if something has a residual, the variance is located only in the residual.

3. Non-directional, "curved" correlations can be inserted only between two entities that have variances. Thus, if two entities each have residual variances, it is the residual variances that get correlated, not the indicators or constructs themselves.

Measurement and Structural Models

When learning SEM, an important distinction to recognize from the start is that between a measurement model and a structural model.

A measurement model consists only of factor-loading paths from the latent constructs (factors) to their manifest indicators, non-directional correlations between constructs (like an oblique factor analysis), and in rare circumstances, correlations between the some of the indicators' residual (error) terms.

When the implementation of a measurement model involves only a single questionnaire instrument and the researcher is seeking to verify an a priori conceptualization of which constructs (subscales) go with which items, then that particular kind of model is a confirmatory factor analysis (CFA).

The differences between confirmatory (CFA) and exploratory (EFA) types of factor analysis should be apparent:

1. In EFA, the number of factors is empirically determined by consulting numerical values generated by the computer (i.e., Kaiser criterion, scree test, parallel analyses), whereas in CFA, the researcher decides the number of factors, based on conceptual/theoretical grounds or precedent in the literature.

2. In EFA, determination of which items go with which factors is, again, done empirically via the factor loadings generated by the computer (which can sometimes create problems if an item loads strongly on more than one factor). In CFA, the assignment of items (manifest indicators) to constructs is, again, done on conceptual grounds. Dual-loading items are avoided in CFA, as the researcher will have each manifest indicator receive an incoming factor-loading path from only one construct (factor).

Once the researcher settles on his or her measurement model (i.e., what the constructs are, and what the manifest indicators are of each), then he or she can develop the structural model. A structural model is simply the network of directional, "causal" paths between constructs. For example, a "life stress" construct (with, perhaps, manifest indicators for work stress, home stress, and miscellaneous stress) might have a directional arrow to a "physical symptoms" construct (with indicators for head and stomach ache, fatigue, and back and joint pain).

This study of family functioning and adolescent development by Cumsille et al. provides some nice diagrams of measurement and structural models (and will also be good to return to later in the semester when we cover multiple-group modeling).

Follow-Up on Promax Rotation's Factor Structure and Factor Pattern Matrices

As we saw today, the Promax (oblique) factor-rotation technique in SPSS provides two different types of output, the Factor Structure matrix and Factor Pattern matrix. Russell (2002, Personality and Social Psychology Bulletin) provides a concise explanation of the difference (p. 1636):

The Factor Structure matrix provides the correlation between each of the measures and the factors that have been extracted and rotated; this is, of course, what we typically think of as factor loadings. However, given that the two factors [the number in Russell's example] are correlated with one another, there may be overlap in these loadings. Therefore the... Factor Pattern matrix, is designed to indicate the independent relationship between each measure and the factors. One can think of the values reported here as being equivalent to standardized regression coefficients, where the two factors are used as predictors of each measure.

Latent and Manifest Variables in SEM

(Updated May 2, 2016)

To begin our coverage of SEM, we'll discuss the conceptual basis of latent variables and their manifest (measured) indicators. The diagram below, which I developed several years ago, provides an "everyday" illustration.

A key idea that we'll discuss throughout the course is that latent variables are error-free.

There are some additional analogies that can be drawn upon:

In Freudian psychology, a distinction is made between the latent and manifest content of dreams.

In biology, a distinction is made between genotype and phenotype.

The book Does Measurement Measure Up?, by John Henshaw, provides a concise summary of how observable, measurable manifestations are used to infer underlying, unobservable propensities: "Aristotle and others have contrasted the observed behavior of an individual with the underlying capacity on which that behavior depended. Intelligence, as one of those underlying capacities, is an ability that may or may not always be observed in everyday life. This underlying capacity must be deduced from observed behaviors" (p. 92).

Procedures of Exploratory Factor Analysis

Let's work our way up the right-hand side of the "SEM Pyramid of Success," examining how the Pearson correlation gives rise to exploratory factor analysis (EFA).

Starting out with a fairly large set of variables (usually single items), EFA will arrange the variables into subsets, where the variables within each subset are strongly correlated with each other. These subsets are organized along axes (the plural of "axis," not the "axe" like a hatchet).

You could have a one-factor (one-dimensional) solution, in which case all the variables will be capable of being located along a single line (e.g., across from left to right, with low scores to the left and high scores to the right). Or there could be a two-factor (two-dimensional) solution, where the axes are across and up-and-down. Three-factor (three-dimensional) solutions are harder to describe verbally, so let's look at a picture. These examples hold only as long as the axes are orthogonal (at 90-degree angles) to each other (which denotes completely uncorrelated factors), an issue to which we'll return. Solutions can also exceed three factors, but we cannot visualize four spatial dimensions (at least I can't).

In conducting factor analyses with a program such as SPSS, there are three main steps, at each of which a decision has to be made:

(1) One must first decide what extraction method to use (i.e., how to "pull out" the dimensions). The two best-known approaches are Principal Axis Factoring (PAF; also known as common factor analysis) and Principal Components Analysis (PCA). There's only one difference, computationally, between PAF and PCA, as described in this document, yet some authors portray the two techniques as being very different (further, PCA is technically not a form of factor analysis, but many researchers treat it as such).

(2) Second, one must decide how many factors to retain. There is no absolute, definitive answer to this question. There are various tests, including the Kaiser Criterion (how many factors or components have eigenvalues greater than or equal to 1.00) and Scree Test (an "elbow curve," where one looks for drop-off in the sizes of the eigenvalues).

The book Does Measurement Measure Up?, by John Henshaw, addresses the indeterminacy of factor analysis in the context of intelligence testing as follows: "Statistical analyses of intelligence test data... have been performed for a long time. Given the same set of data, one can make a convincing, statistically sound argument for a single, overriding intelligence (sometimes called the g factor) or an equally sound argument for multiple intelligences. In Frames of Mind, Howard Gardner argues that 'when it comes to the interpretation of intelligence testing, we are faced with an issue of taste or preference rather than one on which scientific closure is likely to be reached' " (p. 95).

(3) The axes from the original solution will not necessarily come close to sets of data points (loosely speaking, it's like the best-fitting line in a correlational plot). The axes can be rotated to put them into better alignment with the data points. The third decision, therefore, involves the choice of rotation method. Two classes of rotation methods are orthogonal (as described above) and oblique (in which the axes are free to intersect at other than 90-degree angles, which allows the factors to be correlated with each other). Mathworks has a web document on factor rotation, including a nice color-coded depiction of orthogonal and oblique rotation. (As of January 2015, the graphics do not show up in the Mathworks document; however, I had previously saved a copy of the factor-rotation diagram, which I reproduce below.)

 From Mathworks, Factor Analysis


The particular combination of Principal Components Analysis for extraction, the Kaiser Criterion to determine the number of factors, and orthogonal rotation (specifically one called Varimax) is known as the "Little Jiffy" routine, presumably because it works quickly. I've always been a Little Jiffy guy myself (and have written a song about it, below), but in recent years, Little Jiffy has been criticized, both collectively and in terms of its individual steps.

An article by K.J. Preacher and R.C. MacCallum (2003) entitled "Repairing Tom Swift’s Electric Factor Analysis Machine" (explanation of "Tom Swift" reference) gives the following pieces of advice (shown in italics, with my comments inserted in between):

Three recommendations are made regarding the use of exploratory techniques like EFA and PCA. First, it is strongly recommended that PCA be avoided unless the researcher is specifically interested in data reduction... If the researcher wishes to identify factors that account for correlations among [measured variables], it is generally more appropriate to use EFA than PCA...

Another article we'll discuss (Russell, 2002, Personality and Social Psychology Bulletin) concurs that PAF is preferable to PCA, although it acknowledges that the solutions produced by the two extraction techniques are sometimes very similar. Also, data reduction (i.e., wanting to present results in terms of, say, three factor-based subscales instead of 30 original items) seems to be a respectable goal, for which PCA appears appropriate.

Second, it is recommended that a combination of criteria be used to determine the appropriate number of factors to retain... Use of the Kaiser criterion as the sole decision rule should be avoided altogether, although this criterion may be used as one piece of information in conjunction with other means of determining the number of factors to retain.

I concur with this, and Russell's recommendation seems consistent with this.

Third, it is recommended that the mechanical use of orthogonal varimax rotation be avoided... The use of orthogonal rotation methods, in general, is rarely defensible because factors are rarely if ever uncorrelated in empirical studies. Rather, researchers should use oblique rotation methods.

As we'll see, Russell has some interesting suggestions in this area.

One final area, discussed in the Russell article, concerns how to create subscales or indices based on your factor analysis. Knowing that certain items align well with a particular factor (i.e., having high factor loadings), we can either multiply each item by its factor loading before summing the items (hypothetically, e.g., [.35 X Item 1] + [.42 X Item 2] + [.50 X Item 3].......) or just add the items up with equal (or "unit") weighting (Item 1 + Item 2 + Item 3). Russell recommends the latter. It should be noted that, if one obtains a varimax-rotated solution, the newly created subscales will only have zero correlation (independence or orthogonality) with each other if the items are weighted by exact factor scores in creating the subscales.

I have created a diagram to explicate factor scoring in greater detail. Here's another perspective on the issue of factor scores (see the heading "Factor Scores/Scale Scores" when the new page opens).

Here's the Little Jiffy song.

Little Jiffy
Lyrics by Alan Reifman
(May be sung to the tune of “Desperado,” Frey/Henley)

“Little Jiffy,” you know your status is iffy,
Some top statisticians, think that you’re no good,
You are so simple, the users just take the defaults,
And thus halts the process, of finding structure,

You are a three-stage, procedure,
For making, your data concise,
And upon some simple guidelines, you do rest,

But for each step, in the routine,
Little Jiffy’s not so precise,
And the experts say, other choices are best…

For extraction, you use Principal Components,
While all your opponents, advocate P-A-F,
You use Kaiser’s test, to tell the number of factors,
While all your detractors, support the Scree test,

On your behalf, some researchers claim,
Components and factors, yield almost the same,
But computers give, several more options today,
With “Little Jiffy,” you don’t have to stay,
You can experiment, with different ways…

Varimax is used, to implement your rotation,
There’s no correlation, among your axes,
If one goes oblique, like critics urge that you ought to,
The items you brought, ooh (yes, the items that you brought, ooh),
The items you brought, ooh... (Pause)
Will fall close to the lines…


Finally, here are some references for students wishing to pursue EFA in greater detail.

Conway, J.M., & Huffcutt, A.I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168.

Henson, R.K., & Roberts, J.K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66, 393-416.

Path Analysis: Tracing Rules and Re-Deriving Correlation Coefficients

Below are two photos from the recent lectures on path analysis (thanks again to Sothy). The first photo is more conceptual, on how to identify the relevant sequences for multiplying path coefficients.

One of our 2015 students (BL), came up with a great analogy to understand correlated cause (lower-right corner of photo). W and Y can be thought of as the two parents in a family, and X and Z as the two children. If parent W influences child X, and parent Y influences child Z, then X and Z will receive some of the same influence because the two parents' childrearing practices are likely correlated (path q).

The second photo shows an actual example.

Because the model was saturated (every possible linkage that could have been included, was included), the correlation between Age and Income implied by the tracings in the model is identical (within rounding) to the known, input correlation between Age and Income.

MARCH 2013 UPDATE: This manuscript provides an overview of tracing rules (see Section 2.4).

MAY 2009 UPDATE: A newly released Australian study indeed shows a positive relationship between height and earnings.

Least Squares Principle

(Updated March 1, 2016)

Here is a photo from a previous class session, showing what I wrote on the board regarding the least-squares principle. In the illustration below, we're using SAT scores (x-axis) to predict students' first-year college grades (y-axis).  The Wikipedia's entries on least-squares and residuals may also be helpful.

I continue to find websites that offer useful explanations of these concepts:

Multiple Regression, Standardized/Unstandardized Coefficients

Today, we’ll go over the left side of the SEM Pyramid of Success, from the correlation to multiple regression to path analysis, up to the brink of SEM. An important distinction applicable to all of these techniques is between standardized and unstandardized relationships.

The distinction is probably best illustrated, at this point, with multiple regression. Just to remind everyone, in multiple regression we test how well a number of predictor (independent) variables relate to an outcome (dependent) variable. For example, we could use (a) educational attainment, (b) experience on the job, and (c) performance evaluation as predictors of past-year earnings (outcome). The relationship between each predictor and earnings is computed holding constant the effect of the other predictors (e.g., assuming all respondents were equal in their educational attainment and experience on the job, are higher performance evaluations associated with higher earnings?).

[ADDED December 28, 2007: The following PowerPoint slide show provides an extensive review of multiple regression. I noticed an apparent error on the slide entitled, "The Overall Test...," occurring with slides numbered in the high teens to 20, so for discussion of null hypotheses, you should focus on the slide, with numbering in the 40's, that's titled "Test for Individual Terms."]

For each predictor variable in a multiple-regression analysis, the output will provide an unstandardized regression coefficient (usually depicted with the letter B) and a standardized coefficient (usually depicted with the Greek letter Beta, β). Unstandardized results are probably more straightforward to understand, so let’s discuss them first.

Unstandardized relationships are expressed in terms of the variables' original, raw units. Educational attainment would probably be measured in years of education, whereas earnings would probably be measured in dollars. Thus, the unstandardized (B) coefficient for educational attainment could be something like 2000. This would tell us that, for each increment of one raw unit (year) of education, projected earnings would increase by 2000 raw units of income (dollars).

Standardized results represent what happens after all of the variables (predictors and outcome) have initially been converted into z-scores (formula). As you'll recall from your earlier stat classes, z scores convey information in standard-deviation (SD) units; for example, someone who has a z score of +1 on a variable is one SD above the sample mean on that variable (to review SD's, see here and here). If we were measuring respondents' number of miles run per week in an athlete sample, the mean might be, say, 50 miles/week, with an SD of 10. Therefore, an athlete who ran 60 miles/week in training would be at z = +1, or 1 SD above the mean.

Another nice feature of z scores is that, if the data are distributed normally, you can relate them to a person's percentile ranking in the distribution. For example, someone with a z score of +1 on a given variable (84th percentile) is 34 percentile points ahead of someone who has a z score of 0 (50th percentile).

Going back to our example of predicting people's earnings, years of experience may have a standardized regression coefficient (β) of .40. This finding would tell us that, for each increment of one SD of years experience, projected earnings would increase by .40 SD's of income.

To recap to this point:

Unstandardized relationships say that for a one-raw-unit increment on a predictor, the outcome variable increases (or if B is negative, decreases) by a number of its raw units corresponding to what the B coefficient is.

Standardized relationships say that for a one-standard deviation increment on a predictor, the outcome variable increases (or decreases) by some number of SD's corresponding to what the β coefficient is.

When should you use the unstandardized solution and when should you use the standardized one? My own view is as follows: If the raw units are generally familiar (e.g., years, dollars, inches, miles, pounds), I'd go with the unstandardized solution. However, if the variables' raw units are not well-known in everyday usage (e.g., on a marital-satisfaction inventory with a maximum score of 50, what does one point really convey?), then I'd use the standardized solution.

This framework for unstandardized and standardardized solutions applies not only to multiple regression, but also to path analysis and SEM. What is not widely known is that the Pearson r, itself, is a statistic based on standardized variables. The correlation has an unstandardized "cousin," the covariance. The formula for converting between correlations and covariances, which is pretty simple, is shown in this document.

Update (1/19/07): Discussion during our previous class brought out an additional point that I didn't mention in my above write-up (thanks to Kristina).

Within the same regression equation, the different predictor variables' unstandardized B coefficients are not directly comparable to each other, because the raw units for each are (usually) different. In other words, the largest B coefficient will not necessarily be the most significant, as it must be judged in connection with its standard error (B/SE = t, which is used to test for statistical significance).

On the other hand, with standardized analyses, all variables have been converted to a common metric, namely standard-deviation (z-score) units, so the β coefficients can meaningfully be compared in magnitude. In this case, whichever predictor variable has the largest β (in absolute value) can be said to have the most potent relationship to the dependent variable, and this predictor will also have the greatest significance (smallest p value).

Added 4/12/15: Phil Ender has a concise overview of key issues in multiple regression.

SEM Pyramid of Success

(Updated January 21, 2017)

What we'll be covering in the first few class sessions is how SEM represents a culmination of earlier statistical techniques, building from the very basic Pearson's correlation coefficient (r) on up through more elaborate techniques, finally ending at SEM.

John Wooden, who coached men's basketball at UCLA from 1948-1975, winning 10 NCAA championships and garnering accolades for his broader teachings, developed a "Pyramid of Success," which is a guide not only for athletics, but for living a good all-around life.

I grew up a huge UCLA sports fan and went there for undergraduate college (1980-1984). Inspired as I was by Coach Wooden (who died in 2010, just a few months short of his 100th birthday), I created what I call the "Structural Equation Modeling Pyramid of Success," which is shown below.

As we'll see later in the course, as complex as some of our structural equation models can get, the results can always be traced back to simple Pearson correlations.