What’s in a name? Many of us tend to use terms such as artificial intelligence, technology-assisted review (TAR), and predictive coding interchangeably, because there is no single vocabulary governing the field, particularly in an eDiscovery context. But for clarity’s sake, a rose isn’t a rose isn’t a rose here: nomenclature matters. So, here is a quick review of three eDiscovery terms that are often confusing.
Artificial Intelligence (AI)
Broadly speaking, AI is the intelligence of machines. AI includes computer techniques that mimic human cognitive functions, such as learning and problem solving. A common example of AI is IBM’s Watson, a computer system capable of answering questions posed in natural language that soundly defeated two notable Jeopardy! champions in 2011.
For eDiscovery purposes, AI is an “umbrella term for computer methods that emulate human judgment,” including “machine learning,” according to the Grossman-Cormack Dictionary of Technology-Assisted Review. Machine learning uses “a computer algorithm to organize or classify documents by analyzing their features.”
Technology-Assisted Review (TAR)
TAR is perhaps the most confusing of these terms, mainly because it goes by so many other names, including “computer-assisted review,” “content-based advanced analytics,” or simply “assisted review.” TAR is properly viewed as a broad term for any document-classifying technology that is integrated into the process of human document review, and relies upon an algorithm to evaluate files based on the similarity of their content. It thus technically also includes analytics such as e-mail threading, concept clustering, and near-duplicate detection (and, of course, predictive coding, discussed below).
While the origins of this term are not quite clear, an early reference appears in a 2010 study, “Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review,” which was among the first to suggest that the accuracy of “computer-assisted categorization processes” could rival that of human review. A 2011 study, “Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual,” specifically referenced TAR and was the first to suggest that this technology surpassed the former “gold standard” of human review in both effectiveness and cost.
The narrowest of the three terms, predictive coding is part of the TAR toolkit; it merges human judgment with artificial intelligence. Predictive coding uses a machine learning algorithm to study a subject matter expert’s tagging of a training set of documents—a small sample taken from the collection—and then uses what it learned from the expert to classify documents according to their likely relevance.
The first judge to endorse predictive coding’s use was U.S. Magistrate Andrew J. Peck of the Southern District of New York, who wrote that it “appears to be better than the available alternatives, and thus should be used in appropriate cases” in the 2012 “Da Silva Moore v. Publicis Groupe” opinion. (Note: Judge Peck referred to predictive coding as “computer-assisted review.”)
What Does the Rise of the Machines Mean for eDiscovery?
There is no dispute that these approaches make eDiscovery simpler, faster, and more cost-effective. However, no matter which eDiscovery tool legal teams choose , its success still comes down to one thing: the humans behind the machines. Despite the sophistication of their algorithms, computers still lack the analytical and strategic abilities of seasoned eDiscovery practitioners. Without a human expert to meaningfully evaluate potential strategies and customize the tools to the situation at hand, technology may—at best—lend efficiency to a project but little more.
In short, the performance of computer-based eDiscovery tools still has a ceiling defined by the expertise of the people deploying them.