Application of Heuristic-Based Semantic Similarity
Between Single-Topic Texts for the LSAT (RR 12-05)
by Dmitry I. Belov and David A. Kary
Executive Summary
Educational measurement practice (item bank development, form assembly, scoring
of constructed response answers, etc.) involves the development and processing of an
enormous amount of text. This requires large numbers of people to write, read through,
evaluate, classify, edit, score, and analyze the text. Not only is this process time
consuming and resource intensive, but it is also subjective and prone to error. Subject-matter experts must define the construct of the test through some formalized process.
Based on the construct, items are written, reviewed, edited, and classified. Beyond the
individual items, item banks must also be evaluated to identify content overlap, cuing, or
other content features that will lead to dependence among items and reduce construct
representation. Newly written items approved for pretesting must then be administered
to a sample of representative test takers before their statistical quality can be
determined. If the items involve constructed response answers, they must be scored by
trained human raters. Finally item writing must be conducted on a continuous basis due
to security issues, and construct definition must be reevaluated on a regular basis due
to changes in practice or standards.
Natural language processing (NLP) can be used to reduce the above-mentioned
costs in time, money, and labor. NLP is a collection of methods for indexing, classifying,
summarizing, generating, and interpreting texts. Initially, educational measurement
made use of these methods in the development of automated essay-scoring engines.
Recently, however, NLP methods have been applied to nearly every aspect of test
development and psychometrics: item difficulty modeling, using text analysis to improve
scoring in computerized adaptive testing and multistage testing, searching for pairs of
mutually excluded items (item enemies), item generating, and item bank referencing.
This report introduces a heuristic for computing semantic similarity between two
single-topic texts. The heuristic was tested on 10 datasets prepared by a test developer.
Each dataset consisted of 10 Logical Reasoning passages from the Law School
Admission Test (LSAT), where passages P1 and P2 were judged by the test developer
to be similar, and the other 8 passages were judged to be dissimilar from P1 and P2.
Given a dataset, the heuristic was used to compute semantic similarity between P1 and
other passages, and it demonstrated an agreement with the test developer on 8
datasets.
The heuristic has several potential applications for the LSAT: (1) semantic-based
search for possible enemies in an item pool; (2) Internet search for illegally reproduced
cloned items; (3) improvement of estimates of item difficulty through the addition of
semantic features (e.g., semantic similarity between a passage and its key, or between
a key and its distractors).