PART TWO: This is part two of a series on my journey to appreciating TAR. Part one addressed defined TAR, described how it worked and provided tips on when you should consider using TAR. Part two addresses the TAR process, advantages and disadvantages of TAR, and my current thoughts on using the technology.
The TAR Process
Once you have made the decision to utilize TAR in your case, it is imperative that you set up a clearly delineated protocol to ensure that the process goes smoothly and efficiently.
The maxim “Garbage In, Garbage Out” certainly applies in the case of TAR. If you train the system erroneously, you are wasting your time (and money!), as you are going to get back a set of documents that are not going to be helpful. Careful consideration needs to be given to selecting the proper people or “experts” to assist with teaching the computer. These persons need to be knowledgeable about the case and have the ability to make confident decisions about what documents are relevant and not relevant, as such decisions are going to have lasting impact down the road. When possible, it may be beneficial to have two or more attorneys review the initial documents collaboratively, reaching a consensus on the relevance designation for each document, in order to make sure there is a more uniform system of properly coding documents.
It is important to note that the TAR system focuses on a document’s content rather than any metadata, such as date or custodian. As a result, the To, From and Subjects of an email are irrelevant for the purposes of TAR. Additionally, the system learns nothing from photos or numbers ( so if the case involves a particular patent number, the number itself could not be used to train the system). When teaching the system, it is therefore critically important to consider whether the document is a good example from which the system can learn.
Creating a Seed Set
It helps if there is already a set of documents coded that can be deployed into a TAR workflow. If not, as an alternative, the computer can randomly select a set of documents for review. In this situation, it may take several rounds before a proper seed set (alternately called the “training set”) of documents can be developed, as the computer must learn which documents are relevant and which documents are to be rejected. The decisions made on a seed set create the data used to teach the computer how to recognize patterns of relevance in the greater universe of documents, thus facilitating better categorization.
Building a Control Set
The control set is also very important during this process. This is a random, representative set of documents from the entire population of documents that the reviewer codes as responsive or not responsive. This set then acts as the standard (the control) against which the results of the TAR analysis are tested. The control documents measure how well the system has been trained and will ultimately help with determining when training is completed.
There are two types of training. (1) Automatic training utilizes a set of documents that have already been reviewed by a review team (not the designated experts) and are fed into the system. The system then uses the relevance tags from what has already been reviewed and applies those tags to determine relevance for the remaining documents. (2) Manual training is a more preferred approach as it involves an intensive, dedicated effort by an expert, resulting in more consistency than input from a large review team.
Quality Control Review
Once the training set is complete, the reviewers can then start performing quality control reviews of the sets. During this process, the reviewer is checking for several factors, including whether a document itself is relevant to the review, or specific to an issue that you are researching, as well as determining whether the document itself is a good example for TAR (as described above). There is no hard and fast rule for how many quality control sets you will need to review to reach stabilization. Stabilization occurs when additional training will not affect the computer’s ability to determine whether a document is relevant or not. For large projects, this can be as few as 5 rounds of quality control or as many as 10. Even after stabilization is reached, the emerging best practice is to “test the rest,” which means to test documents below the cut-off line to ensure that they in fact are not relevant.
Once stabilization is reached, a determination will be made regarding which documents will need document-by-document review. For example, you may review the top 50% to 75% of TAR-generated relevant documents.
In the end, I discovered that I was needed inall stages of this process – seed sets, control rounds, training rounds, quality control and document-by-document review. My position was not even close to becoming obsolete – and because I became a subject matter expert, I had increased my value to the case team!
Advantages of TAR
There are considerable advantages to TAR, including the following:
- The ability of TAR to take a massive number of documents and reduce them to a far more manageable set by excluding from review documents that are very likely not relevant, saving time and money.
- Allows for a more consistent review, minimizing the human error that results from less uniform application of relevance standards.
- Properly designed TAR processes uncover more relevant documents than a traditional human review and at a lower cost.
- Allows case teams to more quickly assess facts and issues by focusing on the most relevant documents without comprehensive review of a large data set, saving time and further reducing costs.
- Courts are beginning to understand the benefits of TAR and accept (or even promote) its use to streamline the discovery process, increase efficiencies and decrease costs.
Disadvantages of TAR
While TAR has proven to be incredibly helpful in some cases, it is not always the best option.
- There is no industry standard for TAR software, so not all software is going to be equally effective. It may take some trial and error before finding the right software to suit your case’s needs.
- TAR software is effective only with certain types of documents. It relies heavily on documents with rich text information to analyze. It is not able to evaluate documents like spreadsheets (numbers), blueprints, schematics, or any documents that do not contain adequate searchable text. Furthermore, certain file types like video and audio files are not easily analyzed. This type of information may be critical in some cases, and thus a more traditional human review would be appropriate (at least for these file types).
- TAR is only effective where experienced attorneys (or experts) have spent significant time sufficiently training the computer. If a seed set is not properly developed, this can lead to a flawed learning process and can create huge problems throughout the life of the production.
Note About Privilege Review
TAR may not be the best option for a privilege-only review. Unlike relevancy, the complexities associated with a privilege review may not be predictable by an automated process. For example:
- Whether a document is privileged may only be evident from fields that are not considered by TAR algorithms, such as the To and From fields.
- The identification of privileged information may require a subjective judgment call regarding whether legal advice was sought and/or provided.
- Whether a document falls under the protections of marital privilege, common interest privilege and/or joint defense agreements may be even more nuanced.
- Privilege may vary from document to document even if the content is similar. For example, content may be privileged in one document but no longer be privileged if it is forwarded to a third party in another document.
- Waiver of privilege may extend to the subject matter of a document, even if the text of the documents differs.
- TAR algorithms are not able to consider the events surrounding the creation of a document, so a document that is privileged only by virtue of its reference in or to another document may not be properly categorized as privileged.
In some instances, a privilege call affects only part of the document and redactions are needed, while in other instances, the entire document is excluded from the production. In-house counsel may serve multiple roles, including a business role, which may render a communication not privileged.
For the above reasons, employing TAR to identify privileged information presents several risks and the cost savings associated with fewer hours spent combing through documents may not justify these risks. Utilizing TAR to filter out documents that are clearly irrelevant, combined with human review for privilege, will likely yield the best results.
While it can seem frightening at first to put your faith in the hands of a computer program, the biggest takeaway from TAR is that when done correctly, attorneys are always going to be involved in the review and sampling processes, from the formative stages of a case to preparation of the final production. There is no blind reliance on a computer to do an attorney’s work. Instead, TAR cuts through the chaff to get to the wheat, eliminating the need to sift through a myriad of extraneous documents. The most important documents are then put in place for attorneys to review for early case assessment and/or litigation assistance. TAR is not meant to replace standard review processes and protocols, but instead to help streamline those processes so that review can be more targeted, fruitful and efficient.