Identifying Critical Testlet Features Using Tree-Based Regression: An Illustration With the Analytical Reasoning Section of the LSAT (RR 12-04)
High-stakes tests such as the Law School Admission Test (LSAT) often consist of sets of questions (i.e., items) grouped around a common stimulus. Such groupings of items are often called testlets. A basic assumption of item response theory (IRT), the mathematical model commonly used in the analysis of test data, is that individual items are independent of one another. The potential dependency among items within a testlet is often ignored in practice.
In this study, a technique called tree-based regression (TBR) was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data for the Analytical Reasoning section of the LSAT. Relevant features identified included Percentage of “If” Clauses, Number of Entities, Theme/Topic, and Predicate Propositional Density. Results for the IRT model applied to the LSAT indicated that the testlet effect was smallest for stimuli that contained 31% or fewer “if” clauses, contained 9.8% or fewer verbs, and had Media or Animals as the main theme. This study illustrates the merits of TBR in the analysis of test data.