Unraveling the Relationship Between Testlet Features and Item Parameters: An Empirical Example (RR 12-06)
A mathematical model called item response theory (IRT) is often applied to high-stakes tests to determine the characteristics of test questions (i.e., items), such as difficulty, discrimination, and susceptibility to guessing. Note that in this context, the term “discrimination” refers to how well an item distinguishes between higher- and lower-ability test takers. Often, these tests contain subsets of items grouped around a common stimulus (testlet). This grouping often leads to items within one group (testlet) being more strongly correlated among themselves than among items from other groups, which can result in moderate to strong testlet effects.
Recently, it was shown that stimulus features could be used to predict the size of the testlet effect. Furthermore, a strong relationship was found between average item difficulty and the magnitude of the testlet effect. This study explores the relationship between stimulus features and the IRT parameters of difficulty, discrimination, and guessing. It was found that stimuli associated with easy items consisted of many different (but commonly used) words as well as an intermediate proportion of negative words. Relatively short stimuli containing many different words were found to have a high information density, and thus are considered to be very useful for distinguishing between test takers of different ability levels. No useful predictions could be made with regard to susceptibility to guessing, since that parameter did not vary much from one item to the next. It was concluded that stimulus features can be used to manipulate passage texts in such a way that they will have “favorable” properties in terms of testlet effect and average item parameters.