Item response theory (IRT) is a mathematical model that is often applied in the development and analysis of educational and psychological assessments. Various IRT models exist, and practitioners must choose the model that is most appropriate for their particular assessment. Even when the most appropriate model is applied, the fit of the assessment data to the model is rarely perfect in practice. How serious, then, is model misfit for practical decision-making? In this study we analyze two empirical datasets with the aim of investigating the effect of removing misfitting items and misfitting item score patterns on the rank order of test takers according to their proficiency level score. Results for two different IRT models were compared. We found that the impact of removing misfitting items and item score patterns varied depending on the IRT model applied. This effect was more serious when selecting a small to moderate percentage of test takers from a group of test takers. When the percentage selected is larger, misfit is not important.
Request the Full Report
To request the full report, please email Linda Reustle at lreustle@LSAC.org.