Buried beneath the politically-charged headlines of recent weeks regarding Secretary Hillary Clinton’s emails lies a cautionary tale on a subject a surprising number of lawyers and their clients give little thought: eDiscovery. From an initial document set of over 60,000 emails residing on Secretary Clinton’s private servers, her legal team was tasked with identifying “work-related” emails for production to the State Department. Despite producing approximately 30,000 emails, FBI investigators subsequently identified several thousand additional work-related e-mails that should have been included. On these missing emails, the FBI concluded that Clinton’s lawyers relied on subject line and keyword searches in lieu of reviewing all the content of the 60,000 emails. While it is unclear what ramifications, if any, will arise for the lawyers involved in Secretary Clinton’s email production, in a civil setting such oversights could have significant consequences.
Before the advent of digital documents and computer-assisted review, lawyers performing a relevancy or privilege review had to read each page of each document—a task that, based on the volume of documents, could take up to several months at considerable cost. While the option of reviewing every document collected remains an option in the digital age, newer tools are available that, when used correctly, significantly increase review efficiency while reducing review time and costs.
The “keyword” (aka Boolean) search Clinton’s lawyers used is one of the oldest and most common forms of technology assisted review. One or more words are input into software that retrieves documents containing those words or phrases. Some keyword search tools also include advanced features for recognizing and retrieving word derivatives (e.g., plural, past tense, etc.). While the FBI report infers that Clinton’s legal team used only keyword searches, many other advanced options of technology assisted review are available:
Concept Search: A concept search is more advanced than a keyword search, and includes an algorithm that analyzes variables such as proximity and frequency of words or phrases. This tool typically retrieves more documents than a keyword search because it identifies conceptually related documents, whether or not they contain the original keyword(s).
Discussion Threading: Discussion threading algorithms dynamically link related documents (usually email messages) into chronological “threads” representing entire discussions. This simplifies the process of identifying participants to a conversation and provides further insight into the substance of the conversation. Many email programs, such as the Mail app on iPhones, include a variation of this tool.
Clustering: Clustering algorithms automatically organize a large collection of documents into different topic groupings. This can be done automatically when a collection of documents is input into the tool, thereby giving the user an idea of how the document set is organized.
Find Similar: This tool automatically retrieves other documents related to a particular document, which provides full context for the document under review.
Predictive Coding: Predictive coding software “learns” to segregate desired data/information from a larger set. The user reviews a small (relative to the total document set) data set and “teaches” the software which documents in the review set are desirable (e.g., privileged, relevant, responsive, etc.). The tool then applies what it learned to the entire document set to identify desired documents.
No matter which analytic tool (or combination of tools) is selected, defensibility is paramount. That is, if challenged in court, a party must not only be able to explain why it selected the tool(s) it used, but also be able to demonstrate that the tool(s) actually worked. In the FBI’s review of Secretary Clinton’s emails, defensibility did not play a role, but in a civil case the inability to defend the effectiveness of your client’s document review methodologies can lead to significant discovery sanctions.
Even a simple keyword search may be subject to significant scrutiny. As demonstrated by the FBI findings, merely selecting a “best guess” list of keywords is insufficient. In addition to cooperation, whether between opposing parties or between a client and his/her legal team, to establish a defensible process, courts have ruled that some quality control is necessary. For example, selected search terms can be tested against a random subset of the data to be searched, and that subset reviewed to verify that the search terms successfully identified the relevant documents therein.
The need for such attention to detail, and documentation thereof, will only increase as it is estimated that the volume of data managed by IT professionals will increase fifty-fold from 2011 to 2020. Whichever tools are used for eDiscovery, great care must be taken in their selection, application, and verification to ensure a defensible product should the review process come under judicial scrutiny. A judge’s ruling that the selected tools and methods used to identify and produce discoverable information lacks defensibility can result in significant monetary fines, or even adversely affect the outcome of the case.