Optimizing Information Using the Expectation-Maximization Algorithm in Item Response Theory (RR 11-01)
The Law School Admission Test (LSAT) uses a mathematical model called item response theory (IRT) to ensure score and test-form comparability from administration to administration. In order to apply IRT to test data, a set of parameters for each administered question (item) is estimated from the data using a statistical method called marginal maximum likelihood (MML). The expectation-maximization (EM) algorithm underlies the efficient MML estimation of item parameters and is employed operationally for each LSAT administration.
This report first presents a mathematical development and overview of the EM algorithm as it is applied to the three-parameter logistic (3PL) IRT model, the same model used for estimating item parameters for the LSAT. It is then shown that estimating the 3PL IRT model is a special case of a more general, or unconstrained, estimation that applies to almost all IRT models that use dichotomously scored items. Next, the relationship between the EM algorithm and information theory, a mathematical theory of communication that sets limits on the amount of information that can be extracted from data, is examined. The equivalence of MML estimation via the EM algorithm to the minimization of the Kullback–Leibler divergence, a fundamental quantity in information theory, is then illustrated.
Using mathematical results from information theory, it is shown that the unconstrained estimation proposed here provides a fixed reference point against which many other models may be tested—in a sense, an overarching model to test models. A likelihood ratio test developed for testing models against this reference point is provided, and examples using both real and simulated data demonstrate the approach.