Dr. Alan Reifman's SEM Course

Overall Model Fit

This posting examines several measures of overall model fit in greater depth than before. David Kenny's web-based summary of fit indices (links section on the right) is an excellent starting point (G.D. Garson's SEM page also has good information). As we discussed before, indices that have the word "fit" in their names should have high values (close to 1.0, which is the maximum for most of the fit indices) to signify a well-fitting model, whereas indices that have the words "error" or "residual" in their names should be small for a well-fitting model, the closer to 0.0 the better.

The AMOS output will report results for three models: the model you designed (also known as the default or proposed model); the independence (or null) model, which says that each measured variable is correlated exactly 0.0 with each other measured variable (with no latent constructs) and thus usually produces results indicative of poor fit with the data; and finally, the saturated model, which, as we've discussed, uses the maximum available parameters and thus is guaranteed to provide a perfect fit. As shown on Kenny's page, many of the fit indices involve comparisons of your model and the independence model.

Researchers typically will report three or four fit indices in an article or other scientific document. AMOS provides many more fit indices than that, so you'll need to form an opinion as to which ones you feel are the best.

St. Mary's University (Texas) professor Andrea Berndt performed a valuable service for the SEM community in her dissertation research at Old Dominion University. I received a copy of her study at the 1998 American Psychological Association convention and, to my knowledge, it has never been published (from the TTU library site, you can search in Dissertation Abstracts for BERNDT, ANDREA ELIZABETH, with her dissertation also available via Interlibrary Loan).

The background behind Berndt's research is that, in determining which are the optimal fit indices to use, we want ones that are not biased by study features, such as sample size. Fit indices should convey only the goodness of fit (or match) between the known, input correlation (covariance) matrix and the matrix derived from the SEM based on the tracing of paths. If a fit index provides a high or low value in good part just because of sample size or other study features, then it's probably misrepresenting the intrinsic fit of the model.

What Berndt did was search all issues of the journals Educational and Psychological Measurement, Journal of Applied Psychology, Journal of Personality and Social Psychology, and Structural Equation Modeling published between 1986-1996, extracting information on study design features and fit indices from all articles using SEM. Study features coded (as labeled by Berndt) were: sample size, number of indicators per latent variable, number of latent variables, number of estimated paths, and degrees of freedom. Fit indices recorded (if published in a given article or retrievable via other statistical information reported) were: Chi-Square, Comparative Fit Index, Critical Number, Goodness of Fit Index, Normed Fit Index, Non-Normed Fit Index (aka Tucker-Lewis Index), Root Mean Square Error of Approximation, and the Relative Noncentrality Index.

Berndt formed a dataset, with each line representing a study. From this set-up, she could conduct multiple-regression analyses, with each fit index serving as the dependent variable in a given analysis, and the study features serving as predictor variables. Again, because we want the fit indices to be free of any "contamination" from study features, a promising fit index will have the set of predictors produce a very small R-square (as close to 0.0 as possible). Here are the findings:

Chi-Square. R-square = .855 (degrees of freedom, Beta = .864, and sample size, Beta = .278, showed significant relations to Chi-Square size).

CFI. R-square = .084 (degrees of freedom, Beta = -.241, was significantly related to CFI).

CN. R-square = .038 (no study features significantly related to CN).

GFI. R-square = .253 (indicators per LV, Beta = -.283, and sample size, Beta = .174, significantly related to GFI).

NFI. R-square = .150 (df, Beta = -.288, and sample size, Beta = .176, significantly related to NFI).

NNFI. R-square = .027 (no study features significantly related to NNFI).

RMSEA. R-square = .038 (no study features significantly related to RMSEA).

RNI. R-square = .061 (df, Beta = -.238, significantly related to RNI).

I think it's clear which indicators have the smallest R-square values, and thus which ones would be recommended for you to use. If you would like to cite the Berndt paper in your writings to justify your choice of fit indices, the reference is:

Berndt, A.E. (1998, August). "Typical" model features and their effects on goodness-of-fit indices. Presented at the 106th Annual Convention of the American Psychological Association, San Francisco, CA.

UPDATE Summer 2014: Rebecca Oldham, a student in the Spring 2014 class, created the following table to summarize Berndt's findings:

UPDATE 4/27/2013: I found an article by Waterman et al. (2010) that, to my mind, provides good advice on how to present results regarding chi-square in an SEM manuscript:

The chi-square statistic is reported but is not used in interpretation, because it tests the null hypothesis of perfect fit to the data, which is implausible and almost certain to be rejected in models with large samples (p. 52).

Waterman, A. S., Schwartz, S. J., Zamboanga, B. L., Ravert, R. D., Williams, M. K., Agocha, B., Kim, S. Y., & Donnellan, M. B. (2010). The Questionnaire for Eudaimonic Well-Being: Psychometric properties, demographic comparisons, and evidence of validity. Journal of Positive Psychology, 6, 41-61.

Technical Aspects of Drawing AMOS Models

Now that we're beginning to learn how to draw models in AMOS (which, by the way, stands for Analysis of Moment Structures), I thought I'd list some of the technical aspects you'll see in the program. Most of the time, AMOS implements these technical aspects automatically, but it's important you know what is going on. [Update, Feb. 12: I've just added a diagram below to help illustrate the following principles; you can enlarge the diagram by clicking directly on it.]

1. Every manifest indicator (box) or latent construct (big circle) that has an incoming unidirectional "causal" arrow gets a residual (or error) term (small circle).

2. Every manifest indicator and latent construct (like any ordinary variable) gets a variance. If the indicator or construct has no incoming unidirectional arrow, its variance is located in the indicator or construct itself. However, if something has a residual, the variance is located only in the residual.

3. Non-directional, "curved" correlations can be inserted only between two entities that have variances. Thus, if two entities each have residual variances, it is the residual variances that get correlated, not the indicators or constructs themselves.

Measurement and Structural Models

When learning SEM, an important distinction to recognize from the start is that between a measurement model and a structural model.

A measurement model consists only of factor-loading paths from the latent constructs (factors) to their manifest indicators, non-directional correlations between constructs (like an oblique factor analysis), and in rare circumstances, correlations between the some of the indicators' residual (error) terms.

When the implementation of a measurement model involves only a single questionnaire instrument and the researcher is seeking to verify an a priori conceptualization of which constructs (subscales) go with which items, then that particular kind of model is a confirmatory factor analysis (CFA).

The differences between confirmatory (CFA) and exploratory (EFA) types of factor analysis should be apparent:

1. In EFA, the number of factors is empirically determined by consulting numerical values generated by the computer (i.e., Kaiser criterion, scree test, parallel analyses), whereas in CFA, the researcher decides the number of factors, based on conceptual/theoretical grounds or precedent in the literature.

2. In EFA, determination of which items go with which factors is, again, done empirically via the factor loadings generated by the computer (which can sometimes create problems if an item loads strongly on more than one factor). In CFA, the assignment of items (manifest indicators) to constructs is, again, done on conceptual grounds. Dual-loading items are avoided in CFA, as the researcher will have each manifest indicator receive an incoming factor-loading path from only one construct (factor).

Once the researcher settles on his or her measurement model (i.e., what the constructs are, and what the manifest indicators are of each), then he or she can develop the structural model. A structural model is simply the network of directional, "causal" paths between constructs. For example, a "life stress" construct (with, perhaps, manifest indicators for work stress, home stress, and miscellaneous stress) might have a directional arrow to a "physical symptoms" construct (with indicators for head and stomach ache, fatigue, and back and joint pain).

This study of family functioning and adolescent development by Cumsille et al. provides some nice diagrams of measurement and structural models (and will also be good to return to later in the semester when we cover multiple-group modeling).

Follow-Up on Promax Rotation's Factor Structure and Factor Pattern Matrices

As we saw today, the Promax (oblique) factor-rotation technique in SPSS provides two different types of output, the Factor Structure matrix and Factor Pattern matrix. Russell (2002, Personality and Social Psychology Bulletin) provides a concise explanation of the difference (p. 1636):

The Factor Structure matrix provides the correlation between each of the measures and the factors that have been extracted and rotated; this is, of course, what we typically think of as factor loadings. However, given that the two factors [the number in Russell's example] are correlated with one another, there may be overlap in these loadings. Therefore the... Factor Pattern matrix, is designed to indicate the independent relationship between each measure and the factors. One can think of the values reported here as being equivalent to standardized regression coefficients, where the two factors are used as predictors of each measure.

Latent and Manifest Variables in SEM

(Updated May 2, 2016)

To begin our coverage of SEM, we'll discuss the conceptual basis of latent variables and their manifest (measured) indicators. The diagram below, which I developed several years ago, provides an "everyday" illustration.

A key idea that we'll discuss throughout the course is that latent variables are error-free.

There are some additional analogies that can be drawn upon:

In Freudian psychology, a distinction is made between the latent and manifest content of dreams.

In biology, a distinction is made between genotype and phenotype.

The book Does Measurement Measure Up?, by John Henshaw, provides a concise summary of how observable, measurable manifestations are used to infer underlying, unobservable propensities: "Aristotle and others have contrasted the observed behavior of an individual with the underlying capacity on which that behavior depended. Intelligence, as one of those underlying capacities, is an ability that may or may not always be observed in everyday life. This underlying capacity must be deduced from observed behaviors" (p. 92).

Procedures of Exploratory Factor Analysis

Let's work our way up the right-hand side of the "SEM Pyramid of Success," examining how the Pearson correlation gives rise to exploratory factor analysis (EFA).

Starting out with a fairly large set of variables (usually single items), EFA will arrange the variables into subsets, where the variables within each subset are strongly correlated with each other. These subsets are organized along axes (the plural of "axis," not the "axe" like a hatchet).

You could have a one-factor (one-dimensional) solution, in which case all the variables will be capable of being located along a single line (e.g., across from left to right, with low scores to the left and high scores to the right). Or there could be a two-factor (two-dimensional) solution, where the axes are across and up-and-down. Three-factor (three-dimensional) solutions are harder to describe verbally, so let's look at a picture. These examples hold only as long as the axes are orthogonal (at 90-degree angles) to each other (which denotes completely uncorrelated factors), an issue to which we'll return. Solutions can also exceed three factors, but we cannot visualize four spatial dimensions (at least I can't).

In conducting factor analyses with a program such as SPSS, there are three main steps, at each of which a decision has to be made:

(1) One must first decide what extraction method to use (i.e., how to "pull out" the dimensions). The two best-known approaches are Principal Axis Factoring (PAF; also known as common factor analysis) and Principal Components Analysis (PCA). There's only one difference, computationally, between PAF and PCA, as described in this document, yet some authors portray the two techniques as being very different (further, PCA is technically not a form of factor analysis, but many researchers treat it as such).

(ADDED 9/11/18). This EFA tutorial from Columbia University's Mailman School of Public Health provides an intuitive illustration of the distinction between PAF and PCA. As shown in Figure 4 of the document, there are three potential sources of variation on a variable (or, more loosely, three reasons why someone obtains his or her total score on a variable). Let's use an example from the music-liking items in our SPSS practice dataset. Each of the 11 items lists a music style (e.g., big band, bluegrass, classical, jazz) and asks the respondent how much he/she likes it. Let's look specifically at liking for classical music. Someone's liking score for classical music will emerge from some combination of: (a) his or her liking of musical in general (corresponding to "common variance" in Figure 4 of the Columbia document); (b) reasons the person likes classical music that don't pertain to other musical styles such as jazz, blues, etc. (e.g., he or she studied great composers in European history, corresponding to unique or "specific variance" in Figure 4); and (c) any kind of random measurement error such as the person misunderstanding the survey item or accidentally selecting an unintended answer choice (corresponding to "error variance" in Figure 4). As shown by the faint red and purple ovals in Figure 4, PCA seeks to explain variance from all three boxes, whereas PAF only seeks to explain common variance (the first box). Hence, PAF begins with R-squares from a series of multiple-regression analyses predicting liking for each style of music, one at a time, from the remaining styles of music. Doing so reveals the amount of variance common to the music styles. On the other hand, PCA takes it as a given that it is "trying" to explain 100% of the variance in all of the variables.

(2) Second, one must decide how many factors to retain. There is no absolute, definitive answer to this question. There are various tests, including the Kaiser Criterion (how many factors or components have eigenvalues greater than or equal to 1.00) and Scree Test (an "elbow curve," where one looks for drop-off in the sizes of the eigenvalues).

The book Does Measurement Measure Up?, by John Henshaw, addresses the indeterminacy of factor analysis in the context of intelligence testing as follows: "Statistical analyses of intelligence test data... have been performed for a long time. Given the same set of data, one can make a convincing, statistically sound argument for a single, overriding intelligence (sometimes called the g factor) or an equally sound argument for multiple intelligences. In Frames of Mind, Howard Gardner argues that 'when it comes to the interpretation of intelligence testing, we are faced with an issue of taste or preference rather than one on which scientific closure is likely to be reached' " (p. 95).

(3) The axes from the original solution will not necessarily come close to sets of data points (loosely speaking, it's like the best-fitting line in a correlational plot). The axes can be rotated to put them into better alignment with the data points. The third decision, therefore, involves the choice of rotation method. Two classes of rotation methods are orthogonal (as described above) and oblique (in which the axes are free to intersect at other than 90-degree angles, which allows the factors to be correlated with each other). Mathworks has a web document on factor rotation, including a nice color-coded depiction of orthogonal and oblique rotation. (As of January 2015, the graphics do not show up in the Mathworks document; however, I had previously saved a copy of the factor-rotation diagram, which I reproduce below.)

From Mathworks, Factor Analysis

***

The particular combination of Principal Components Analysis for extraction, the Kaiser Criterion to determine the number of factors, and orthogonal rotation (specifically one called Varimax) is known as the "Little Jiffy" routine, presumably because it works quickly. I've always been a Little Jiffy guy myself (and have written a song about it, below), but in recent years, Little Jiffy has been criticized, both collectively and in terms of its individual steps.

An article by K.J. Preacher and R.C. MacCallum (2003) entitled "Repairing Tom Swift’s Electric Factor Analysis Machine" (explanation of "Tom Swift" reference) gives the following pieces of advice (shown in italics, with my comments inserted in between):

Three recommendations are made regarding the use of exploratory techniques like EFA and PCA. First, it is strongly recommended that PCA be avoided unless the researcher is specifically interested in data reduction... If the researcher wishes to identify factors that account for correlations among [measured variables], it is generally more appropriate to use EFA than PCA...

Another article we'll discuss (Russell, 2002, Personality and Social Psychology Bulletin) concurs that PAF is preferable to PCA, although it acknowledges that the solutions produced by the two extraction techniques are sometimes very similar. Also, data reduction (i.e., wanting to present results in terms of, say, three factor-based subscales instead of 30 original items) seems to be a respectable goal, for which PCA appears appropriate.

Second, it is recommended that a combination of criteria be used to determine the appropriate number of factors to retain... Use of the Kaiser criterion as the sole decision rule should be avoided altogether, although this criterion may be used as one piece of information in conjunction with other means of determining the number of factors to retain.

I concur with this, and Russell's recommendation seems consistent with this.

Third, it is recommended that the mechanical use of orthogonal varimax rotation be avoided... The use of orthogonal rotation methods, in general, is rarely defensible because factors are rarely if ever uncorrelated in empirical studies. Rather, researchers should use oblique rotation methods.

As we'll see, Russell has some interesting suggestions in this area.

One final area, discussed in the Russell article, concerns how to create subscales or indices based on your factor analysis. Knowing that certain items align well with a particular factor (i.e., having high factor loadings), we can either multiply each item by its factor loading before summing the items (hypothetically, e.g., [.35 X Item 1] + [.42 X Item 2] + [.50 X Item 3].......) or just add the items up with equal (or "unit") weighting (Item 1 + Item 2 + Item 3). Russell recommends the latter. It should be noted that, if one obtains a varimax-rotated solution, the newly created subscales will only have zero correlation (independence or orthogonality) with each other if the items are weighted by exact factor scores in creating the subscales.

I have created a diagram to explicate factor scoring in greater detail. Here's another perspective on the issue of factor scores (see the heading "Factor Scores/Scale Scores" when the new page opens).

Here's the Little Jiffy song.

Little Jiffy
Lyrics by Alan Reifman
(May be sung to the tune of “Desperado,” Frey/Henley)

“Little Jiffy,” you know your status is iffy,
Some top statisticians, think that you’re no good,
You are so simple, the users just take the defaults,
And thus halts the process, of finding structure,

(Bridge)
You are a three-stage, procedure,
For making, your data concise,
And upon some simple guidelines, you do rest,

But for each step, in the routine,
Little Jiffy’s not so precise,
And the experts say, other choices are best…

For extraction, you use Principal Components,
While all your opponents, advocate P-A-F,
You use Kaiser’s test, to tell the number of factors,
While all your detractors, support the Scree test,

On your behalf, some researchers claim,
Components and factors, yield almost the same,
But computers give, several more options today,
With “Little Jiffy,” you don’t have to stay,
You can experiment, with different ways…

Varimax is used, to implement your rotation,
There’s no correlation, among your axes,
If one goes oblique, like critics urge that you ought to,
The items you brought, ooh (yes, the items that you brought, ooh),
The items you brought, ooh... (Pause)
Will fall close to the lines…

***

Finally, here are some references for students wishing to pursue EFA in greater detail.

Conway, J.M., & Huffcutt, A.I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168.

Henson, R.K., & Roberts, J.K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66, 393-416.

Path Analysis: Tracing Rules and Re-Deriving Correlation Coefficients

Below are two photos from the recent lectures on path analysis (thanks again to Sothy). The first photo is more conceptual, on how to identify the relevant sequences for multiplying path coefficients.

One of our 2015 students (BL), came up with a great analogy to understand correlated cause (lower-right corner of photo). W and Y can be thought of as the two parents in a family, and X and Z as the two children. If parent W influences child X, and parent Y influences child Z, then X and Z will receive some of the same influence because the two parents' childrearing practices are likely correlated (path q).

The second photo shows an actual example.

Because the model was saturated (every possible linkage that could have been included, was included), the correlation between Age and Income implied by the tracings in the model is identical (within rounding) to the known, input correlation between Age and Income.

MARCH 2013 UPDATE: This manuscript provides an overview of tracing rules (see Section 2.4).

MAY 2009 UPDATE: A newly released Australian study indeed shows a positive relationship between height and earnings.

Overall Model Fit

Technical Aspects of Drawing AMOS Models

Measurement and Structural Models

Follow-Up on Promax Rotation's Factor Structure and Factor Pattern Matrices

Latent and Manifest Variables in SEM

Procedures of Exploratory Factor Analysis

Path Analysis: Tracing Rules and Re-Deriving Correlation Coefficients

Dr. Reifman's...

SEM Overview Pages

Specific SEM Issues

Garson SEM e-Book

Optional Books for Students Seeking Additional Perspective

SEM dissertations/theses from former QM IV students (*In "Proquest" Database via TTU library)

Mplus Information

Onyx Information

AMOS Information

Path Analysis

Bidirectional Arrows, 2SLS, Instrumental Variables

Factor Analysis (Exploratory)

Causality/Longitudinal

Latent Growth Modeling

Journals, Articles

Miscellaneous

TTU Resources

Blog Archive

What songs from previous years do you want to sing at SEM The Musical 6? (You may vote for up to five.)