Maximum Likelihood Estimation

(Updated April 10, 2018)

Today, let's take some time to talk about Maximum Likelihood Estimation (MLE), which is the default estimation procedure in AMOS and is considered the standard for the field. In my view, MLE is not as intuitively graspable as Ordinary Least Squares (OLS) estimation, which simply seeks to locate the best-fitting line in a scatter plot of data so that the line is as close to as many of the data points as possible. In other words, OLS minimizes the squared deviation scores between each actual data point and where an individual with a given score on the X-axis would fall on the best-fitting line, hence "least squares." However, Maximum Likelihood is considered to be statistically advantageous.

This website maintained by S. Purcell provides what I think is a very clear, straightforward introduction to MLE. In particular, we'll want to look at the second major heading on the page that comes up, Model-Fitting.

Purcell describes the mission of MLE as being to "find the parameter values that make the observed data most likely." Here's an analogy I came up with, fitting Purcell's definition. Suppose we observed a group of people laughing uproariously (the "data"). One could then ask which generating-model would make the laughter most likely, a television comedy show or a drama about someone dying of cancer?

Another site lists some of the advantages of MLE, vis-a-vis OLS.

Lindsay Reed, our former computer lab director, once loaned me a book on the history of statistics, the unusually titled, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (by David Salsburg, published in 2001).

This book discusses the many statistical contributions of Sir Ronald A. Fisher, among which is MLE. Writes Salsburg:

In spite of Fisher's ingenuity, the majority of situations presented intractable mathematics to the potential user of the MLE (p. 68).

Practically speaking, obtaining MLE solutions required repeated iterations, which was very difficult to achieve, until the computer revolution. Citing the ancient mathematician Robert Recorde, Salsburg writes: first guess the answer and apply it to the problem. There will be a discrepancy between the result of using this guess and the result you want. You take that discrepancy and use it to produce a better guess... For Fisher's maximum likelihood, it might take thousands or even millions of iterations before you get a good answer... What are a mere million iterations to a patient computer? (p. 70).

UPDATE I: The 2013 textbook by Texas Tech Business Administration professor Peter Westfall and Kevin Henning, Understanding Advanced Statistical Methods, includes additional description of MLE. The above-referenced Purcell page provides an example with a relatively simple equation for the likelihood function. Westfall and Henning, while providing a more mathematically intense discussion of MLE, have several good explanatory quotes:

In cases of complex advanced statistical models such as regressions, structural equation models, and neural networks, there are often dozens or perhaps even hundreds of parameters in the likelihood function (p. 317).

In practice, likelihood functions tend to be much more complicated [than the book's examples], and you won't be able to solve the calculus problem even if you excel at math. Instead, you'll have to use numerical methods, a fancy term for "letting the computer do the calculus for you." ... Numerical methods for finding MLEs work by iterative approximation. They start with an initial guess... then update the guess to some value... by climbing up the likelihood function... The iteration continues until the successive values... are so close to one another that the computer is willing to assume that the peak has been achieved. When this happens, the algorithm is said to converge (p. 325; emphasis in original).

This is what the Minimization History portion of the AMOS output refers to, along with the the possible error message that one's model has failed to converge.

UPDATE II: The reference given by our 2014 guest speaker on MLE is:

Ferron, J. M., & Hess, M. R. (2007). Estimation in SEM: A concrete example. Journal of Educational and Behavioral Statistics, 32, 110-120.