SEM The Musical 2

On Wednesday, April 23, we will present SEM The Musical 2. [Update: It's now been presented.] I've written some new songs, as has one of our students (shown below), plus we'll perform some "classics" from last year's musical.

It Do Run Run
Lyrics by Alan Reifman
(May be sung to the tune of “Da Doo Run Run,” Spector/Greenwich/Barry)

Got to draw a model that is error-free,
So it will run, run, so it will run,
Got to have constraints where they’re supposed to be,
So it will run, run, so it will run,

Oh, I got a Heywood Case,
Variance, I must replace,
Everything, is back on pace,
It do run, run, run; it do run, run,

Making sure my model is identified,
So it will run, run, so it will run,
Making sure conditions are all satisfied,
So it will run, run, so it will run,

Yeah, it runs so well,
The fit indices are swell,
No problems on which to dwell,
It do run, run, run; it do run, run...

And now, three songs about enumerating your degrees of freedom and the related issue of model identification.

Count ’Em Up
Lyrics by Alan Reifman
(May be sung to the tune of “Build Me Up Buttercup,” d'Abo/Macaulay, for The Foundations)

You’ve got to count ’em up (count ’em up), degrees of freedom,
So you’ll understand (understand), the model at hand,
And when you compare (you compare), two nested models,
When you add some paths (add some paths), to your arrow graphs,
You’ll know how (you’ll know how), to conduct the delta test,
And decide which model you’ll seize,
So count ’em up (count ’em up), all of your freedom’s degrees,

The measures in your trove, their variances and cov’s,
Are in a half-matrix, they’re your known elements,
From these you subtract, parameters you enact,
The model that you state, and freely estimate,

(Hey, hey, hey!) The df’s, are the difference,
(Hey, hey, hey!) Start out with known elements,
(Hey, hey, hey!) Then deduct,
The free parameters, and now it all makes sense,

You’ve got to count ’em up (count ’em up), degrees of freedom,
So you’ll understand (understand), the model at hand,
And when you compare (you compare), two nested models,
When you add some paths (add some paths), to your arrow graphs,
You’ll know how (you’ll know how), to conduct the delta test,
And decide which model you’ll seize,
So count ’em up (count ’em up), all of your freedom’s degrees…

D-of-F in SEM
Lyrics by Shera Jackson
(May be sung to the tune of "Flowers on the Wall," Lew DeWitt for the Statler Brothers)

I keep hearing about counting degrees of freedom for SEM,
Trying to keep it all straight is hard to do,
If I were a statistician, I wouldn’t worry none,
As I’m adding this up, I’m starting to have fun,

Counting degrees of freedom for SEM,
That don’t bother me at all,
Counting up the elements,
And now the parameters,
I’m adding all the knowns and subtracting the unknowns,
Now, don’t tell me I’ve nothing to do,

Last night I made a matrix, found the diagonal,
That’s my variances, and underneath are my co-v’s,
Please, don’t forget the means when using “means and intercepts,”
Square the elements, subtract them, divide by 2, and add them back,

Counting degrees of freedom for SEM,
That don’t bother me at all,
Counting up the elements,
And now the parameters,
I’m adding all the knowns and subtracting the unknowns,
Now, don’t tell me I’ve nothing to do,

Well, let’s count the unknowns, so many, free factor loadings,
Structural Paths, non-directional correlations,
Indicator residual variances, and
Construct residual variances ,and construct variances,

Counting degrees of freedom for SEM,
That don’t bother me at all,
Counting up the elements,
And now the parameters,
I’m adding all the knowns and subtracting the unknowns,
Now, don’t tell me I’ve nothing to do,

Now , counting degrees of freedom for SEM,
That don’t bother me at all,
Counting up the elements,
And now the parameters,
I’m adding all the knowns and subtracting the unknowns,
Now, don’t tell me I’ve nothing to do,

Don’t tell me I’ve nothing to do...

Lyrics by Alan Reifman
(May be sung to the tune of “Overjoyed,” Stevie Wonder)

To get your, structural diagram, to run fine,
Elements, and your parameters, must align, oh,
If you ask too much, your model will crash,
Plan it carefully, don’t let your choices be rash,

You cannot, have unknowns that number, more than knowns,
Negative, your degrees of freedom, cannot go, yeah,
You can use constraints, so free paths reduce,
Without more measures, that’s all you can do,

Under-iden-tified will not run,
What have you done?
You’ve posited,
More than known in-for-ma-tion,
To make sure that all is satisfied,
It has to be,
Overall, over-iii-dentified…

If you draw, all the curves and arrows, that you can,
You will have, mandated perfect fit, on your hands, oh,
If you saturate, you can’t judge the fit,
It’s always perfect, that’s automatic,

Just-i-den-ti-fied fit, will be one,
What have you done?
You’ve drawn all paths,
There could be under the sun,
To make sure that all is satisfied,
It has to be,
Overall, over-iii-dentified…

Chi-Square Rising
Lyrics by Alan Reifman
(May be sung to the tune of “Bad Moon Rising,” John Fogerty)

I see a chi-square rising,
I see the fit going astray,
You need to add more parameters,
What could be another pathway?

Parsimony’s nice, bad fit could be a price,
There’s a chi-square on the rise,

I see a chi-square rising,
I see there’s more that can be done,
Relations, you need to account for,
Then you can let the model run,

Parsimony’s nice, bad fit could be a price,
There’s a chi-square on the rise…

Curvy, Swervy, Dual-Connected, Correlation Bi-Directed
Lyrics by Alan Reifman
(May be sung to the tune of “Itsy-Bitsy, Teeny-Weeny, Yellow Polka-Dot Bikini,” Vance/Pockriss)

They were unsure whether A tends to precede B,
Or whether B occurs prior to A,
What could they do, to depict this in their model?
What symbolic notation could they portray?

It's not too late,
A and B could correlate,

They drew a curvy, swervy, dual-connected, correlation bi-directed,
That goes right in between A and B,
A curvy, swervy, dual-connected, correlation bi-directed,
So no one had to state causality...

Three Wave
Lyrics by Alan Reifman
(May be sung to the tune of “Heat Wave,” Holland/Dozier/Holland, popularized by Martha Reeves and the Vandellas)

You’re doing a survey,
Of how people change,
A trio of interviews,
With each person, you’ll arrange,

Autoregressive, and cross-lagged paths,
Equality constraints on the math,

You’ve got a three-wave,
Panel study design (three-wave!),
Not quite causation (three-wave!),
But, precedence of time…

Example of a three-wave panel model.

Equality Constraints

This week, we'll learn about equality constraints. An equality constraint tells the SEM computer program that, in reaching its solution, it must provide the identical unstandardized coefficient for all parameters within a set that has been designated for equality (even when equality constraints have been imposed, standardized coefficients may not be exactly the same within the constrained set).

Implementation of equality constraints in Mplus is illustrated at this site (see section 3.0).*

Suppose that equality constraints are placed on the structural paths from two predictor constructs to an outcome construct. In the absence of constraints, one predictor might, for example, take on an unstandardized coefficient of .40 and the other predictor might take on a value of .30. Because of the constraints, however, the values must be identical, so both paths might take on a value of .35 (I don't know that the solution must always "split the difference," but it's probably a reasonable way to think about it).

Constraining the two (or more) values to equality, of necessity, harms model fit; the mathematically optimal MLE paths in the above example would have been .40 and .30. Giving the two paths the identical .35 is thus suboptimal mathematically, but it provides greater parsimony because we can say that a single value (.35) works reasonably well for both paths.

Equality constraints involve comparative model-testing and delta-chi square tests, as we've seen before. In this new context, what we do is run the model twice, once without constraints and once with (this is considered "nested"). If the delta-chi square test (with delta df) is significant, we say the constraints significantly harm model fit and we ditch them. If the rise in chi-square due to the constraints is not significant, we then retain the constraints in the name of parsimony.

Equality constraints have at least four purposes, as far as I can tell:

1. Theory/hypothesis testing. The following example, from one of my older articles, involves trying to test if three adolescent suicidal behaviors -- thoughts, communication, and attempts -- are three gradations on the same underlying dimension or are more qualitatively different. We reasoned that, if the behaviors were gradations along the same dimension, then each psychosocial predictor should relate equivalently to the three suicidal behaviors. If, on the other hand, the suicidal behaviors were qualitatively different, then a given predictor might relate significantly to one of them, but not all three.

2. Playing the "Devil's Advocate."

We will look at the example (near Figure 3) in the following article, which we can access online via the TTU Library.

Breckler, S.J. (1990). Applications of covariance structure modeling in psychology: Cause for concern? Psychological Bulletin, 107, 260-273.

3. Longitudinal/panel models.

We will go over several examples, including some from:

Farrell, A.D. (1994) Structural equation modeling with longitudinal data: Strategies for examining group differences and reciprocal relationships. Journal of Consulting and Clinical Psychology, 62, 477-487.

As the title notes, this article is also good for studying multiple-group modeling...

4. Multiple-group analyses. Here's a graphic I made to illustrate the use of equality constraints in this context. (AMOS uses letters to denote paths constrained to equality, whereas Mplus uses numbers.)


*Designation of equality is done with the AMOS program via letters. For a given parameter, you can go to the "Object Properties" box and, for the parameter value, you can pre-specify a letter such as A, B, C, etc. Any two (or more) parameters to which you assign an A will all become constrained to take on the same unstandardized value; any two (or more) parameters to which you assign a B will be become constrained to take on identical unstandardized values, etc. Only parameters with the same letter will take on identical values. In other words, the uniform coefficient taken on by the set of A constraints will (almost certainly) be different from the uniform coefficient taken on by the B set.

"Equivalent Models" Problem

We'll next consider what is known as the "equivalent models" problem, and other cautions and limitations of SEM. In doing so, we will look at two articles that are available via the TTU library's website:

MacCallum, R.C., Wegener, D.T., Uchino, B.N., & Fabrigar, L.R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114, 185-199.

(We'll also have a song about the equivalent models problem, entitled, "Your Model's Only One.")

Tomarken, A.J., & Waller, N.G. (2003). Potential problems with "well fitting" models. Journal of Abnormal Psychology, 112, 578-598.

In addition, regarding one of MacCallum et al.'s suggestions, here's an article that incorporates participation in a randomized experimental program (yes/no) into an SEM model.

Florsheim, P., Burrow-Sánchez, J., Minami, T., McArthur, L. & Heavin, S. (2012). The Young Parenthood Program: Supporting positive paternal engagement through co-parenting counseling. American Journal of Public Health, 102, 1886-1892.

Stating Hypotheses

Today, I'd like to cover interpretive clarity in writing about your hypotheses and results. Many beginning writers on SEM simply restate the numerical information from their output, tables, and figures, without providing substantive interpretations.

Earl Babbie's textbook, The Practice of Social Research (2007, 11th ed.) contains a guest essay by Riley E. Dunlap, entitled "Hints for Stating Hypotheses" (p. 47). Here is an excerpt of what I believe is the key advice:

The key is to word the hypothesis carefully so that the prediction it makes is quite clear to you as well as others. If you use age, note that saying "Age is related to attitudes toward women's liberation" does not say precisely how you think the two are related... You have two options:"

1. "Age is related to attitudes toward women's liberation, with younger adults being more supportive than older adults"...

2. "Age is negatively related to support for women's liberation"...

As Dunlap demonstrates, these two statements of the hypothesis let the reader know with specificity which people are expected to hold which type of attitudes (I have added the color and bold emphases above).

Results should be described similarly -- not just that a standardized regression path between Construct A and Construct B was .46, but that (given the positively signed relationship) the more respondents do whatever is embodied in Construct A, the more they also do what is embodied in Construct B.

One of my SEM-based publications from several years ago, which is accessible on TTU computers via Google Scholar, can serve as a guide.

Thomas, G., Reifman, A., Barnes, G.M., & Farrell, M.P. (2000). Delayed onset of drunkenness as a protective factor for adolescent alcohol misuse and sexual risk-taking: A longitudinal study. Deviant Behavior, 21, 181-210.

(Added April 29, 2015): This webpage explains the distinction between a hypothesis (when you have a prediction) and a research question (when you don't).

Maximum Likelihood Estimation

(Updated April 10, 2018)

Today, let's take some time to talk about Maximum Likelihood Estimation (MLE), which is the default estimation procedure in AMOS and is considered the standard for the field. In my view, MLE is not as intuitively graspable as Ordinary Least Squares (OLS) estimation, which simply seeks to locate the best-fitting line in a scatter plot of data so that the line is as close to as many of the data points as possible. In other words, OLS minimizes the squared deviation scores between each actual data point and where an individual with a given score on the X-axis would fall on the best-fitting line, hence "least squares." However, Maximum Likelihood is considered to be statistically advantageous.

This website maintained by S. Purcell provides what I think is a very clear, straightforward introduction to MLE. In particular, we'll want to look at the second major heading on the page that comes up, Model-Fitting.

Purcell describes the mission of MLE as being to "find the parameter values that make the observed data most likely." Here's an analogy I came up with, fitting Purcell's definition. Suppose we observed a group of people laughing uproariously (the "data"). One could then ask which generating-model would make the laughter most likely, a television comedy show or a drama about someone dying of cancer?

Another site lists some of the advantages of MLE, vis-a-vis OLS.

Lindsay Reed, our former computer lab director, once loaned me a book on the history of statistics, the unusually titled, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (by David Salsburg, published in 2001).

This book discusses the many statistical contributions of Sir Ronald A. Fisher, among which is MLE. Writes Salsburg:

In spite of Fisher's ingenuity, the majority of situations presented intractable mathematics to the potential user of the MLE (p. 68).

Practically speaking, obtaining MLE solutions required repeated iterations, which was very difficult to achieve, until the computer revolution. Citing the ancient mathematician Robert Recorde, Salsburg writes: first guess the answer and apply it to the problem. There will be a discrepancy between the result of using this guess and the result you want. You take that discrepancy and use it to produce a better guess... For Fisher's maximum likelihood, it might take thousands or even millions of iterations before you get a good answer... What are a mere million iterations to a patient computer? (p. 70).

UPDATE I: The 2013 textbook by Texas Tech Business Administration professor Peter Westfall and Kevin Henning, Understanding Advanced Statistical Methods, includes additional description of MLE. The above-referenced Purcell page provides an example with a relatively simple equation for the likelihood function. Westfall and Henning, while providing a more mathematically intense discussion of MLE, have several good explanatory quotes:

In cases of complex advanced statistical models such as regressions, structural equation models, and neural networks, there are often dozens or perhaps even hundreds of parameters in the likelihood function (p. 317).

In practice, likelihood functions tend to be much more complicated [than the book's examples], and you won't be able to solve the calculus problem even if you excel at math. Instead, you'll have to use numerical methods, a fancy term for "letting the computer do the calculus for you." ... Numerical methods for finding MLEs work by iterative approximation. They start with an initial guess... then update the guess to some value... by climbing up the likelihood function... The iteration continues until the successive values... are so close to one another that the computer is willing to assume that the peak has been achieved. When this happens, the algorithm is said to converge (p. 325; emphasis in original).

This is what the Minimization History portion of the AMOS output refers to, along with the the possible error message that one's model has failed to converge.

UPDATE II: The reference given by our 2014 guest speaker on MLE is:

Ferron, J. M., & Hess, M. R. (2007). Estimation in SEM: A concrete example. Journal of Educational and Behavioral Statistics, 32, 110-120.

Deriving Degrees of Freedom for Love Style Model (Plus Discussion of Free vs. Fixed Parameters)

(Updated February 18, 2017)

The advice below applies when one is running models using the AMOS program.  Suggestions when using ONYX are shown in red. 

A key element of this discussion involves freely estimated (or free) parameters vs. fixed parameters. The term "freely estimated" refers to the program determining the value for a path or variance in accordance with the data and the mathematical estimation procedure. A freely estimated path might come out as .23 or .56 or -.33, for example. Freely estimated parameters are what we're used to thinking about. However, for technical reasons, we sometimes must "fix" a value, usually to 1. This means that a given path or variance will take on a value of 1 in the model, simply because we tell it to. Fixed values only apply to unstandardized solutions; a value fixed to 1 will appear as 1 in an unstandardized solution, but usually appear as something different in a standardized solution. These examples should become clearer as we work through models.

Here is an initial example with a hypothetical one-factor, three-indicator model (thanks to Andrea P. for the photograph). Without fixing the unstandardized factor loading for indicator "a" to 1 (in AMOS), the model would be seeking to freely estimate 7 unknown parameters from only 6 known pieces of information. The model would thus be under-identified (also referred to as "unidentified"), which metaphorically is like being in "debt."

Keiley et al. (2005, in Sprenkle & Piercy, eds., Research Methods in Family Therapy) discuss the metric-setting rationale for fixing a single loading per factor to 1:

One of the problems we face in SEM is that the latent constructs are unobserved; therefore, we do not know their natural metric. One of the ways that we define the true score metric is by setting one scaling factor loading to 1.00 from each group of items (pp. 446-447).

In ONYX, it seems to make more sense to me to let all the factor loadings be freely estimated (none of the fixed to 1), but instead fix the factor variance to 1.

Below is the photograph Kristina took of the board in 2008, with the derivation of degrees of freedom for the Hendrick & Hendrick Love Styles model. (This photo has been annotated over the years.)

In ONYX, there are also 63 unknown, freely estimated parameters, but I would allocate them differently than how I would in AMOS. In ONYX, there would be 24 free factor loadings; 15 non-directional correlations; & and 24 indicator residuals. (I would fix the 6 construct variances to 1 in ONYX.)

This slideshow (especially slides 29-31) provides more information on making sure your model is identified.

One of the students in the class, noting the repeated references to "knowns" and "unknowns" in running the model, sent me this video link to provide some levity.

Characterizing a Latent Construct

Updated February 5, 2014

I encourage everyone to think of a latent construct (such as the CONSERVATIVISM construct in this entry) as the shared variation (or correlatedness) between the manifest indicators. My reasoning, loosely stated, followed these three steps.

1. The standardized factor loadings are based upon the correlations between any two manifest indicators. For example, if one indicator has a standardized loading of .70 and another has a loading of .80, the Pearson correlation between the two indicators will be .56 or thereabouts (i.e., the product of the two loadings). High loadings go along with high correlations.

2. High loadings, which are considered desirable for having a strong factor, thus signify correlatedness (or shared variation) among the indicators.

3. Taking a little leap from step 2, one can think of the factor itself as representing shared variation among its indicators. The "tiny bubbles" pointing to each manifest indicator thus represent variation in a given indicator that is not due to the common factor and are considered to represent error variance. Quoting from Barbara Byrne (2001; Structural Equation Modeling with AMOS):

Error associated with observed variables represents measurement error, which reflects on their adequacy in measuring the related underlying factors... Measurement error derives from two sources: random measurement error... and error uniqueness, a term used to describe error variance arising from some characteristic that is considered to be specific (or unique) to a particular indicator variable. (p. 9)

Going back to our example of the common cold as a common factor, uniqueness would refer to instances of, for example, sneezing due to allergies, not as part of a cold.

Exploratory Factor Analysis: Axis Rotation

I've made a PowerPoint graphic of my idea that rotating axes in factor analysis is analogous to rotating the streets (or laying down new streets) to make them closer to people's houses. Here it is...