LSAC Resources - Research
Research Reports
Automatic Prediction of Item Difficulty Based on Semantic Similarity Measures (RR 08-04)
by Dmitry I. Belov and Lily Knezevich
Executive Summary
Item difficulty modeling (IDM) assimilates various techniques from data mining, pattern recognition, and natural language processing. The primary task of IDM is to estimate the statistical properties of a test question (item), such as difficulty, based on features extracted directly from the item. An example of such a feature might be the number of rarely used words in the passage and in the correct/incorrect answer choices in a multiple-choice item. The corresponding feature space can be partitioned using a decision tree analysis. In this way, the impact of a feature on item difficulty and other statistical characteristics can be validated.
IDM can be applied to enhance many routine psychometric analyses. Additionally, IDM can help testing organizations manage the pretesting of new items more efficiently, since estimates of item difficulty can be available before pretest statistics are obtained.
This study was motivated by the results of research on item pool analysis and design. It was shown that items with certain content and statistical characteristics could dramatically improve the ability of an item pool to support the assembly of test forms. Although item writers are able to control the content characteristics of items, item difficulty is not so easily controlled. Therefore, there is a practical demand for automated tools to predict item difficulty.
This report introduces new automated features for predicting item difficulty based on WordNet—a lexical database available online. Methods for computing a semantic similarity measure between two texts and a self-similarity measure of a text are developed. Each method is based on certain characteristics of the semantic similarity matrix of two texts, where each element of the matrix measures the semantic similarity of two corresponding words. Based on these measures, a decision tree for predicting item difficulty is constructed and validated. The results of experiments with reading comprehension items show that the new features are useful for predicting item difficulty. In particular, they predicted correctly in 83–86% of cases, which was 11–14% higher than the prediction based on the Breland Word Frequency index.
Why not? (Provide additional feedback below. NOTE: If you have a question or concern regarding your specific circumstances, please go to the Contact Us page.)
Please enter a comment.
Thank you for your feedback.
Get Adobe Reader to view PDFs indicated on this site by (PDF)
