In a large-scaled high-stakes testing program such as the Law School Admission Test (LSAT), it is necessary to maintain a large bank of test items to support the demand for a new test form at nearly every administration. To assure that the item bank can support the test assembly requirements, ongoing monitoring of the quality of the item bank is necessary to identify deficiencies and direct item development efforts. Recent research along these lines has included efforts to identify the properties of the most valuable item in the item pool, identify the test assembly constraint(s) that are the most difficult to meet, determine the distribution of test taker ability that supports the highest degree of usability of the item pool, and develop statistical test assembly targets for multiple stage testing.
Many of these practical testing problems have recently been addressed by the application of test sampling methods. Test sampling may be described as the sequential assembly of multiple test forms such that each question (item) or item set can be used an unlimited number of times. Therefore, test forms produced can overlap with each other, i.e., have items/item sets in common. The result is a sample of test forms from the finite set of all test forms available from the given item pool under a given set of test assembly constraints. In order to insure that the inferences from such research are statistically correct, the sampling must be uniform, that is, each test should have an equal chance of being assembled. Thus, the test assembly problem plays a fundamental role in the test sampling method. In particular, our interest is in methods of uniform test assembly providing uniform test sampling.
Mixed integer programming is an approach commonly applied to the test assembly problem. This paper presents proof that the mixed integer programming approach cannot guarantee uniformity of the sampling, and goes on to formulate a test assembly algorithm that assures a uniform sampling. An extension for assembling multiple nonoverlapping test forms/test sections based on item usage frequency is described, as are extensions to multiple stage and computerized adaptive testing.
The methods illustrated in the paper provide researchers and practitioners from testing organizations with a simple and flexible framework for assembling tests and monitoring item pools in linear, multiple stage, and computerized adaptive testing.
Request the Full Report
To request the full report, please email Linda Reustle at lreustle@LSAC.org.