Identifying Critical Testlet Features Using Tree-Based Regression: An Illustration With the Analytical Reasoning Section of the LSAT (RR 12-04)
by Muirne C. S. Paap, Qiwei He, and Bernard P. Veldkamp, University of Twente, Enschede, The Netherlands
Executive Summary
High-stakes tests such as the Law School Admission Test (LSAT) often consist of
sets of questions (i.e., items) grouped around a common stimulus. Such groupings of
items are often called testlets. A basic assumption of item response theory (IRT), the
mathematical model commonly used in the analysis of test data, is that individual items
are independent of one another. The potential dependency among items within a testlet
is often ignored in practice.
In this study, a technique called tree-based regression (TBR) was applied to identify
key features of stimuli that could properly predict the dependence structure of testlet
data for the Analytical Reasoning section of the LSAT. Relevant features identified
included Percentage of "If" Clauses, Number of Entities, Theme/Topic, and Predicate
Propositional Density. Results for the IRT model applied to the LSAT indicated that the
testlet effect was smallest for stimuli that contained 31% or fewer "if" clauses, contained
9.8% or fewer verbs, and had Media or Animals as the main theme. This study
illustrates the merits of TBR in the analysis of test data.