Developing an Assessment of First-Year Law Students’ Critical Case Reading and Reasoning Ability: Phase 2 (GR 08-02)

Executive Summary

The research team in this project developed a prototype, multiple-choice test to assess case reading and reasoning among law students at two points during the first year. The project was motivated by the importance of cases in legal education, a paucity of empirical evidence concerning law students’ reading abilities, and a need for measurements that could be used to study pedagogical interventions and students’ skill development. A primary goal of the research project was to test the theoretical argument that law students have difficulty dealing with what are called the indeterminacies, or discourse-specific ambiguities and vagueness of cases.

Phase 1 of this study employed a cross-sectional, matched-subjects design that produced results indicating that among the 161 first-year law students who took the prototype test version (TV1) in either the fall or spring semesters, the overall combined mean was 7.91; the most common score was 9/14 (64%) correct, and the scores ranged from 14/14 (100%) to 3/14 (21%). Individual test items showed a wide range of difficulty, with a good balance between easy, difficult, and moderately difficult items. No significant difference in test performance was detected between semesters.

Phase 2 of the study was undertaken to test the validity of the test and to see if findings could be replicated using a longitudinal design involving not only first-year, but also third-year law students. To accomplish these research goals a second, alternate version of the test (TV2) was constructed. Data were collected in the spring and fall of 2006 among the five law schools involved in Phase 1. The 146 first-year subjects who participated in a silent reading version of the study were randomly assigned to either TV1 or TV2. Sixty-three third-year students, who had participated in Phase 1, volunteered to sit for TV2. In addition, 30 students from a sixth law school responded to a truncated version of both tests under think-aloud conditions.

Reading materials for both of these multiple-choice tests consisted of three cases related to an appellate argument. The 14 questions—resulting from multiple, iterative reviews among legal experts—reflected two comprehension difficulty categories (individual cases versus cross-case questions) and two semantic difficulty categories (determinate-meaning versus indeterminate-meaning questions). All test items contained five possible answer choices, and justifications were written for each possible answer.

Statistical analyses produced results similar to the findings from Phase 1. No significant differences between first- and third-year scores were detected. The total mean scores for each test version were 8.3 for TV1 and 7.25 for TV2. Mean scores for students between semesters (two and three) or between years (first and third) were not significantly different, indicating that students’ case reading and reasoning skills do not improve as a result of law school instruction. Also, consistent with Phase 1 findings, the test showed a positive but low correlation with LSAT scores and law school grade point averages.

In terms of the validity question, the means and distributions of scores between phases and administrations were highly similar. In addition, the pattern of students scoring best on single-case, determinate items and worst on cross-case, indeterminate items held, as did the decreasing compounding effect of item types. To further test the validity claim, multiple methods of establishing construct validity were undertaken: statistical analysis using a variation of differential item functioning (DIF), expert review, and process means using think-aloud data. These analyses yielded outcomes related to the equivalence of the two test versions. It was found that these methods yielded only moderate correspondence and that the process method produced the most meaningful results.

Think-aloud data were qualitatively analyzed to identify constructs, skills, and subskills believed to undergird case reading and reasoning. By comparing think-aloud findings with written item justifications, test writers were able to essentially “test their hypotheses” about case reading. These data also allowed us to revise the test on empirical, grounded bases and made manifest the students’ errors.

Overall we believe that this research achieved its proposed goals. Future research in this area could focus on the uses of these revised test versions or might use them as prototypes to construct similar types of items. We see such efforts as advancing the formidable challenges of describing developmental trajectories in legal literacy.

Request the Full Report

To request the full report, please email Linda Reustle at