Overall Model Fit

This posting examines several measures of overall model fit in greater depth than before. David Kenny's web-based summary of fit indices (links section on the right) is an excellent starting point (G.D. Garson's SEM page also has good information). As we discussed before, indices that have the word "fit" in their names should have high values (close to 1.0, which is the maximum for most of the fit indices) to signify a well-fitting model, whereas indices that have the words "error" or "residual" in their names should be small for a well-fitting model, the closer to 0.0 the better.

The AMOS output will report results for three models: the model you designed (also known as the default or proposed model); the independence (or null) model, which says that each measured variable is correlated exactly 0.0 with each other measured variable (with no latent constructs) and thus usually produces results indicative of poor fit with the data; and finally, the saturated model, which, as we've discussed, uses the maximum available parameters and thus is guaranteed to provide a perfect fit. As shown on Kenny's page, many of the fit indices involve comparisons of your model and the independence model.

Researchers typically will report three or four fit indices in an article or other scientific document. AMOS provides many more fit indices than that, so you'll need to form an opinion as to which ones you feel are the best.

St. Mary's University (Texas) professor Andrea Berndt performed a valuable service for the SEM community in her dissertation research at Old Dominion University. I received a copy of her study at the 1998 American Psychological Association convention and, to my knowledge, it has never been published (from the TTU library site, you can search in Dissertation Abstracts for BERNDT, ANDREA ELIZABETH, with her dissertation also available via Interlibrary Loan).

The background behind Berndt's research is that, in determining which are the optimal fit indices to use, we want ones that are not biased by study features, such as sample size. Fit indices should convey only the goodness of fit (or match) between the known, input correlation (covariance) matrix and the matrix derived from the SEM based on the tracing of paths. If a fit index provides a high or low value in good part just because of sample size or other study features, then it's probably misrepresenting the intrinsic fit of the model.

What Berndt did was search all issues of the journals Educational and Psychological Measurement, Journal of Applied Psychology, Journal of Personality and Social Psychology, and Structural Equation Modeling published between 1986-1996, extracting information on study design features and fit indices from all articles using SEM. Study features coded (as labeled by Berndt) were: sample size, number of indicators per latent variable, number of latent variables, number of estimated paths, and degrees of freedom. Fit indices recorded (if published in a given article or retrievable via other statistical information reported) were: Chi-Square, Comparative Fit Index, Critical Number, Goodness of Fit Index, Normed Fit Index, Non-Normed Fit Index (aka Tucker-Lewis Index), Root Mean Square Error of Approximation, and the Relative Noncentrality Index.

Berndt formed a dataset, with each line representing a study. From this set-up, she could conduct multiple-regression analyses, with each fit index serving as the dependent variable in a given analysis, and the study features serving as predictor variables. Again, because we want the fit indices to be free of any "contamination" from study features, a promising fit index will have the set of predictors produce a very small R-square (as close to 0.0 as possible). Here are the findings:

Chi-Square. R-square = .855 (degrees of freedom, Beta = .864, and sample size, Beta = .278, showed significant relations to Chi-Square size).

CFI. R-square = .084 (degrees of freedom, Beta = -.241, was significantly related to CFI).

CN. R-square = .038 (no study features significantly related to CN).

GFI. R-square = .253 (indicators per LV, Beta = -.283, and sample size, Beta = .174, significantly related to GFI).

NFI. R-square = .150 (df, Beta = -.288, and sample size, Beta = .176, significantly related to NFI).

NNFI. R-square = .027 (no study features significantly related to NNFI).

RMSEA. R-square = .038 (no study features significantly related to RMSEA).

RNI. R-square = .061 (df, Beta = -.238, significantly related to RNI).

I think it's clear which indicators have the smallest R-square values, and thus which ones would be recommended for you to use. If you would like to cite the Berndt paper in your writings to justify your choice of fit indices, the reference is:

Berndt, A.E. (1998, August). "Typical" model features and their effects on goodness-of-fit indices. Presented at the 106th Annual Convention of the American Psychological Association, San Francisco, CA.

UPDATE Summer 2014: Rebecca Oldham, a student in the Spring 2014 class, created the following table to summarize Berndt's findings:

UPDATE 4/27/2013: I found an article by Waterman et al. (2010) that, to my mind, provides good advice on how to present results regarding chi-square in an SEM manuscript:

The chi-square statistic is reported but is not used in interpretation, because it tests the null hypothesis of perfect fit to the data, which is implausible and almost certain to be rejected in models with large samples (p. 52).

Waterman, A. S., Schwartz, S. J., Zamboanga, B. L., Ravert, R. D., Williams, M. K., Agocha, B., Kim, S. Y., & Donnellan, M. B. (2010). The Questionnaire for Eudaimonic Well-Being: Psychometric properties, demographic comparisons, and evidence of validity. Journal of Positive Psychology, 6, 41-61.