Unraveling the Relationship Between Testlet Features
and Item Parameters: An Empirical Example (RR 12-06)
by Muirne C. S. Paap and
Bernard P. Veldkamp,
University of Twente, Enschede, The Netherlands
Executive Summary
A mathematical model called item response theory (IRT) is often applied to high-stakes tests to determine the characteristics of test questions (i.e., items), such as
difficulty, discrimination, and susceptibility to guessing. Note that in this context, the
term “discrimination” refers to how well an item distinguishes between higher- and
lower-ability test takers. Often, these tests contain subsets of items grouped around a
common stimulus (testlet). This grouping often leads to items within one group
(testlet) being more strongly correlated among themselves than among items from
other groups, which can result in moderate to strong testlet effects.
Recently, it was shown that stimulus features could be used to predict the size of
the testlet effect. Furthermore, a strong relationship was found between average item
difficulty and the magnitude of the testlet effect. This study explores the relationship
between stimulus features and the IRT parameters of difficulty, discrimination, and
guessing. It was found that stimuli associated with easy items consisted of many
different (but commonly used) words as well as an intermediate proportion of
negative words. Relatively short stimuli containing many different words were found
to have a high information density, and thus are considered to be very useful for
distinguishing between test takers of different ability levels. No useful predictions
could be made with regard to susceptibility to guessing, since that parameter did not
vary much from one item to the next. It was concluded that stimulus features can be
used to manipulate passage texts in such a way that they will have “favorable”
properties in terms of testlet effect and average item parameters.