Sixth Circuit judges have taken an interest in “corpus linguistics.” At a recent gathering in northern Kentucky, three Sixth Circuit judges engaged in an impromptu discussion of the interpretive tool. And last week, in Wilson v. Safelite Group, two other Sixth Circuit judges wrote concurrences debating its merits.
A “corpus” is simply a collection of texts. “Corpus linguistics” is where big data meets legal interpretation: textualist jurists and scholars (most prominently at BYU) are using a vast searchable collection of texts to assess linguistic meaning(s), frequency, and change. The movement has a somewhat populist (or “ordinary meaning”) bent, with the corpus including “regular usage” texts—magazines, books, academic articles, and speeches—but not dictionaries.
Lawyers and judges can search corpora to identify how a word was used during a specific time frame; as one commentator described it, corpus linguistics is “Lexis on Steroids.” Corpus linguistics cannot alone determine a word’s meaning (the way a dictionary might), but it can provide empirical evidence to guide a judge choosing among multiple plausible or time-sensitive meanings.
In 2011, Justice Thomas Lee of the Utah Supreme Court was the first to use corpus linguistics in a judicial opinion: In re the Adoption of Baby E.Z.* Since then, the Utah Supreme Court has continued to use corpus linguistics, and in 2016 majority and dissenting opinions from the Michigan Supreme Court both embraced corpus linguistics in People v. Harris.
Now, corpus linguistics has reached the Sixth Circuit: Judge Amul Thapar relied on corpus linguistics in a concurrence—the first time (as far as we can tell) that a federal judicial opinion has done so.
Wilson v. Safelite addressed whether ERISA (natch) applies to Safelite’s deferred executive compensation program. The appeal turned on whether the phrase “results in” includes the meaning “requires” and whether “extending to” includes the meaning “until a certain point in time.” Judge Jane Stranch wrote the majority opinion, joined in full by Judge John Rogers, concluding that neither phrase included the proposed meaning.
Judge Thapar concurred: he agreed with Judge Stranch’s textual analysis, as far as it went, but went on to explain that “[c]ourts should consider adding this tool”—corpus linguistics—“to their belts.” The concurrence discussed how corpus linguistics confirmed the panel’s interpretation: “results in” never was used to mean “required” during the 1960s and ‘70s, when Congress enacted ERISA. Judge Thapar went on to explain how corpus linguistics can, for example, illuminate a word’s most common meaning:
“Its foremost value may come in those difficult cases where statutes split and dictionaries diverge. In those cases, corpus linguistics can serve as a cross-check on established methods of interpretation (and vice versa).”
In a separate concurrence, Judge Stranch offered three critiques:
- Corpora are not representative—Americans might say “flood” most often during superstorms, even if most still understand “flood” also to include 3 inches of water in the basement.
- Corpora may provide the most common meaning of a term, when a law uses it in a less frequent sense.
- The law often demands not the word on the street, but the expertise of experienced lexicographers (we looked it up: “word definers”) found in dictionaries or amicus briefs.
Judge Thapar’s response? Interpreting legal text is hard, and hard choices will persist even with the help corpus linguistics can offer.
Will corpus linguistics become a common interpretive method within some corners of the Learned Sixth? That remains to be seen. But anyone appearing before Judge Thapar (and undoubtedly others on the court) should take note of his closing line: “adversarial briefing on corpus linguistics can help courts as they roll up their sleeves and grapple with a term’s ordinary meaning.” A well-prepared advocate won’t regret having consulted big data as well as a big dictionary.