Researching on tablet

Research Library

All reports in LSAC’s Research Library are available upon request. Executive summaries are available below for the latest LSAT Technical Reports and other research published within the last 10 years.

Looking for older reports? Consult the Research Archive

Current Research:

This report focuses on 2024-25 applicants, exploring who applied to law school, when applicants first thought about law school, why they applied, and more.

Test collusion (TC) is the sharing of test materials or answers to test questions (items) before or during a test. Because of the potentially large advantages for the test takers involved, TC poses a serious threat to the validity of score interpretations. The proposed approach applies graph theory methodology to response similarity analyses to identify groups involved in TC while minimizing the false-positive detection rate. The new approach is illustrated and compared with a recently published method using real and simulated data.

This report addresses a general type of cluster aberrancy in which a subgroup of test takers has an unfair advantage on some subset of administered items. Examples of cluster aberrancy include item preknowledge and test collusion. In general, cluster aberrancy is hard to detect due to the multiple unknowns involved: Unknown subgroups of test takers have an unfair advantage on unknown subsets of items. The issue of multiple unknowns makes the detection of cluster aberrancy a challenging problem from the standpoint of applied mathematics.

Most high-stakes testing programs apply methods to identify unlikely patterns of correct/incorrect responses to test questions. Some examples of why such patterns may occur include misinterpretation of questions, question preknowledge, answer copying, or guessing behavior. This report provides an overview of existing approaches to identifying atypical response patterns that fall into a class of analyses known as nonparametric statistics. Results of a simulation study comparing the different approaches, along with guidelines for applying these indices in practice, are also presented.

This report presents a new algorithm for detecting groups of test takers (aberrant groups) who had access to subsets of test questions (aberrant subsets) prior to an exam. This method is in line with the development of statistical methods for detecting test collusion, a new research direction in test security. Test collusion may be described as the large-scale sharing of test materials, including answers to test questions. The algorithm employs several new statistics to perform a sequence of statistical tests to identify aberrant groups.

Item response theory (IRT) is a mathematical model used to support the development, analysis, and scoring of tests and questionnaires. For example, IRT allows for the description of item (i.e., question) characteristics, such as difficulty, as well as the proficiency level of test takers. Various IRT models are available, and choosing the most appropriate model for a particular test is essential. Since the fit of the test data to the chosen model is never perfect, measuring the fit of the model to the data is imperative.

Many standardized tests are now administered via computer rather than paper-and-pencil format. In a computer-based testing environment, it is possible to record not only the test taker’s response to each question (item), but also the amount of time spent by the test taker in considering and answering each item. Response times (RTs) provide information not only about the test taker’s ability and response behavior but also about item and test characteristics. The current study focuses on the use of RTs to detect aberrant test-taker responses.

In standardized testing, test takers may change their answer choices for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. Research on answer-changing behavior has recently branched off in several directions, including modeling of ACs and addressing scanning errors.

While an admission test may strongly predict success in university or law school programs for most test takers, there may be some test takers who are mismeasured. To address this issue, a class of statistics called person-fit statistics is used to check the validity of individual test scores. However, most person-fit statistics are designed for a single test, and not much is known about the performance of these statistics for admission tests consisting of multiple highly correlated subtests.

In standardized multiple-choice testing, test takers often change their answers for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. This report presents two new approaches to analyzing ACs at the individual test-taker level. The information about all previous answers is used only to partition the data into two disjoint subsets: responses where an AC occurred and responses where an AC did not occur.

When a test taker has prior knowledge about an administered test question (item), then this event is called item preknowledge, the test taker is called aberrant, and the item is called compromised. Item preknowledge negatively affects the corresponding testing program and its test score users (universities, companies, government organizations) because the scores produced for aberrant test takers will be invalid. The performance of eight statistics for detection of item preknowledge (five existing, two modified, and one new) was studied via computer simulations.

Item response theory (IRT) is a mathematical model that is often applied in the development and analysis of educational and psychological assessments. Various IRT models exist, and practitioners must choose the model that is most appropriate for their particular assessment. Even when the most appropriate model is applied, the fit of the assessment data to the model is rarely perfect in practice. How serious, then, is model misfit for practical decision-making?