Optimizing Information Using the Expectation-Maximization
Algorithm in Item Response Theory (RR 11-01)
by Alexander Weissman
Executive Summary
The Law School Admission Test (LSAT) uses a mathematical model called item
response theory (IRT) to ensure score and test-form comparability from administration
to administration. In order to apply IRT to test data, a set of parameters for each
administered question (item) is estimated from the data using a statistical method called
marginal maximum likelihood (MML). The expectation-maximization (EM) algorithm
underlies the efficient MML estimation of item parameters and is employed operationally
for each LSAT administration.
This report first presents a mathematical development and overview of the EM
algorithm as it is applied to the three-parameter logistic (3PL) IRT model, the same
model used for estimating item parameters for the LSAT. It is then shown that
estimating the 3PL IRT model is a special case of a more general, or unconstrained,
estimation that applies to almost all IRT models that use dichotomously scored items.
Next, the relationship between the EM algorithm and information theory, a mathematical
theory of communication that sets limits on the amount of information that can be
extracted from data, is examined. The equivalence of MML estimation via the EM
algorithm to the minimization of the Kullback–Leibler divergence, a fundamental quantity
in information theory, is then illustrated.
Using mathematical results from information theory, it is shown that the
unconstrained estimation proposed here provides a fixed reference point against which
many other models may be tested—in a sense, an overarching model to test models. A
likelihood ratio test developed for testing models against this reference point is
provided, and examples using both real and simulated data demonstrate the approach.