Overall Model Fit

This posting examines several measures of overall model fit in greater depth than before. David Kenny's web-based summary of fit indices (links section on the right) is an excellent starting point (G.D. Garson's SEM page also has good information). As we discussed before, indices that have the word "fit" in their names should have high values (close to 1.0, which is the maximum for most of the fit indices) to signify a well-fitting model, whereas indices that have the words "error" or "residual" in their names should be small for a well-fitting model, the closer to 0.0 the better.

The AMOS output will report results for three models: the model you designed (also known as the default or proposed model); the independence (or null) model, which says that each measured variable is correlated exactly 0.0 with each other measured variable (with no latent constructs) and thus usually produces results indicative of poor fit with the data; and finally, the saturated model, which, as we've discussed, uses the maximum available parameters and thus is guaranteed to provide a perfect fit. As shown on Kenny's page, many of the fit indices involve comparisons of your model and the independence model.

Researchers typically will report three or four fit indices in an article or other scientific document. AMOS provides many more fit indices than that, so you'll need to form an opinion as to which ones you feel are the best.

St. Mary's University (Texas) professor Andrea Berndt performed a valuable service for the SEM community in her dissertation research at Old Dominion University. I received a copy of her study at the 1998 American Psychological Association convention and, to my knowledge, it has never been published (from the TTU library site, you can search in Dissertation Abstracts for BERNDT, ANDREA ELIZABETH, with her dissertation also available via Interlibrary Loan).

The background behind Berndt's research is that, in determining which are the optimal fit indices to use, we want ones that are not biased by study features, such as sample size. Fit indices should convey only the goodness of fit (or match) between the known, input correlation (covariance) matrix and the matrix derived from the SEM based on the tracing of paths. If a fit index provides a high or low value in good part just because of sample size or other study features, then it's probably misrepresenting the intrinsic fit of the model.

What Berndt did was search all issues of the journals Educational and Psychological Measurement, Journal of Applied Psychology, Journal of Personality and Social Psychology, and Structural Equation Modeling published between 1986-1996, extracting information on study design features and fit indices from all articles using SEM. Study features coded (as labeled by Berndt) were: sample size, number of indicators per latent variable, number of latent variables, number of estimated paths, and degrees of freedom. Fit indices recorded (if published in a given article or retrievable via other statistical information reported) were: Chi-Square, Comparative Fit Index, Critical Number, Goodness of Fit Index, Normed Fit Index, Non-Normed Fit Index (aka Tucker-Lewis Index), Root Mean Square Error of Approximation, and the Relative Noncentrality Index.

Berndt formed a dataset, with each line representing a study. From this set-up, she could conduct multiple-regression analyses, with each fit index serving as the dependent variable in a given analysis, and the study features serving as predictor variables. Again, because we want the fit indices to be free of any "contamination" from study features, a promising fit index will have the set of predictors produce a very small R-square (as close to 0.0 as possible). Here are the findings:

Chi-Square. R-square = .855 (degrees of freedom, Beta = .864, and sample size, Beta = .278, showed significant relations to Chi-Square size).

CFI. R-square = .084 (degrees of freedom, Beta = -.241, was significantly related to CFI).

CN. R-square = .038 (no study features significantly related to CN).

GFI. R-square = .253 (indicators per LV, Beta = -.283, and sample size, Beta = .174, significantly related to GFI).

NFI. R-square = .150 (df, Beta = -.288, and sample size, Beta = .176, significantly related to NFI).

NNFI. R-square = .027 (no study features significantly related to NNFI).

RMSEA. R-square = .038 (no study features significantly related to RMSEA).

RNI. R-square = .061 (df, Beta = -.238, significantly related to RNI).

I think it's clear which indicators have the smallest R-square values, and thus which ones would be recommended for you to use. If you would like to cite the Berndt paper in your writings to justify your choice of fit indices, the reference is:

Berndt, A.E. (1998, August). "Typical" model features and their effects on goodness-of-fit indices. Presented at the 106th Annual Convention of the American Psychological Association, San Francisco, CA.

UPDATE Summer 2014: Rebecca Oldham, a student in the Spring 2014 class, created the following table to summarize Berndt's findings:



UPDATE 4/27/2013: I found an article by Waterman et al. (2010) that, to my mind, provides good advice on how to present results regarding chi-square in an SEM manuscript:

The chi-square statistic is reported but is not used in interpretation, because it tests the null hypothesis of perfect fit to the data, which is implausible and almost certain to be rejected in models with large samples (p. 52).

Waterman, A. S., Schwartz, S. J., Zamboanga, B. L., Ravert, R. D., Williams, M. K., Agocha, B., Kim, S. Y., & Donnellan, M. B. (2010). The Questionnaire for Eudaimonic Well-Being: Psychometric properties, demographic comparisons, and evidence of validity. Journal of Positive Psychology, 6, 41-61.

Technical Aspects of Drawing AMOS Models

Now that we're beginning to learn how to draw models in AMOS (which, by the way, stands for Analysis of Moment Structures), I thought I'd list some of the technical aspects you'll see in the program. Most of the time, AMOS implements these technical aspects automatically, but it's important you know what is going on. [Update, Feb. 12: I've just added a diagram below to help illustrate the following principles; you can enlarge the diagram by clicking directly on it.]

1. Every manifest indicator (box) or latent construct (big circle) that has an incoming unidirectional "causal" arrow gets a residual (or error) term (small circle).

2. Every manifest indicator and latent construct (like any ordinary variable) gets a variance. If the indicator or construct has no incoming unidirectional arrow, its variance is located in the indicator or construct itself. However, if something has a residual, the variance is located only in the residual.

3. Non-directional, "curved" correlations can be inserted only between two entities that have variances. Thus, if two entities each have residual variances, it is the residual variances that get correlated, not the indicators or constructs themselves.

Measurement and Structural Models

When learning SEM, an important distinction to recognize from the start is that between a measurement model and a structural model.

A measurement model consists only of factor-loading paths from the latent constructs (factors) to their manifest indicators, non-directional correlations between constructs (like an oblique factor analysis), and in rare circumstances, correlations between the some of the indicators' residual (error) terms.

When the implementation of a measurement model involves only a single questionnaire instrument and the researcher is seeking to verify an a priori conceptualization of which constructs (subscales) go with which items, then that particular kind of model is a confirmatory factor analysis (CFA).

The differences between confirmatory (CFA) and exploratory (EFA) types of factor analysis should be apparent:

1. In EFA, the number of factors is empirically determined by consulting numerical values generated by the computer (i.e., Kaiser criterion, scree test, parallel analyses), whereas in CFA, the researcher decides the number of factors, based on conceptual/theoretical grounds or precedent in the literature.

2. In EFA, determination of which items go with which factors is, again, done empirically via the factor loadings generated by the computer (which can sometimes create problems if an item loads strongly on more than one factor). In CFA, the assignment of items (manifest indicators) to constructs is, again, done on conceptual grounds. Dual-loading items are avoided in CFA, as the researcher will have each manifest indicator receive an incoming factor-loading path from only one construct (factor).

Once the researcher settles on his or her measurement model (i.e., what the constructs are, and what the manifest indicators are of each), then he or she can develop the structural model. A structural model is simply the network of directional, "causal" paths between constructs. For example, a "life stress" construct (with, perhaps, manifest indicators for work stress, home stress, and miscellaneous stress) might have a directional arrow to a "physical symptoms" construct (with indicators for head and stomach ache, fatigue, and back and joint pain).

This study of family functioning and adolescent development by Cumsille et al. provides some nice diagrams of measurement and structural models (and will also be good to return to later in the semester when we cover multiple-group modeling).

Follow-Up on Promax Rotation's Factor Structure and Factor Pattern Matrices

As we saw today, the Promax (oblique) factor-rotation technique in SPSS provides two different types of output, the Factor Structure matrix and Factor Pattern matrix. Russell (2002, Personality and Social Psychology Bulletin) provides a concise explanation of the difference (p. 1636):

The Factor Structure matrix provides the correlation between each of the measures and the factors that have been extracted and rotated; this is, of course, what we typically think of as factor loadings. However, given that the two factors [the number in Russell's example] are correlated with one another, there may be overlap in these loadings. Therefore the... Factor Pattern matrix, is designed to indicate the independent relationship between each measure and the factors. One can think of the values reported here as being equivalent to standardized regression coefficients, where the two factors are used as predictors of each measure.

Latent and Manifest Variables in SEM

(Updated May 2, 2016)

To begin our coverage of SEM, we'll discuss the conceptual basis of latent variables and their manifest (measured) indicators. The diagram below, which I developed several years ago, provides an "everyday" illustration.


A key idea that we'll discuss throughout the course is that latent variables are error-free.

There are some additional analogies that can be drawn upon:

In Freudian psychology, a distinction is made between the latent and manifest content of dreams.

In biology, a distinction is made between genotype and phenotype.

The book Does Measurement Measure Up?, by John Henshaw, provides a concise summary of how observable, measurable manifestations are used to infer underlying, unobservable propensities: "Aristotle and others have contrasted the observed behavior of an individual with the underlying capacity on which that behavior depended. Intelligence, as one of those underlying capacities, is an ability that may or may not always be observed in everyday life. This underlying capacity must be deduced from observed behaviors" (p. 92).