What is Predictive Coding?

Predictive Coding, also known as Technology/Computer Assisted Review, uses machine learning technology automatically to review and categorise a large number of documents based on the approach taken by a human reviewer in a smaller sample set.

How does Predictive Coding work?

Predictive Coding starts with human input. A human user reviews a sample set of documents randomly selected from a larger data set, identifying documents that are relevant, not relevant, privileged and so on.

The programme “learns” from the user’s review and builds a series of rules for automatically categorising documents to replicate the user’s method. In doing so, the programme takes into account many different factors including the file-type, common terms, custodians, recipients and other metadata. The programme then applies this model to the larger data set to identify each category of documents.

The user reviews an initial batch of documents categorised by the programme and corrects as necessary. The programme will, again, learn from the user’s corrections, incorporate the appropriate changes into its categorisation model and re-run the algorithm over the larger data set. This process can be repeated until the user is confident that the programme is correctly categorising the documents in the larger data set.

Where is Predictive Coding being used?

Predictive Coding is increasingly being used in dispute proceedings – particularly in those that involve high volumes of background documents or information.

A major advantage of Predictive Coding is in the disclosure process in English court proceedings – where, for standard disclosure, each party is required to disclose the documents it relies on, as well as those which adversely affect its own case, or support or adversely affect the other party’s(ies’) case(s).

In the landmark case of Pyrrho Investments v MWB Property & Ors [ 2016 ] EWHC 256 (Ch), Taylor Wessing’s Edward Spencer obtained the English court’s first express approval of the use of Predictive Coding in disclosure – in which the court recognised the significant time and costs benefits that could be gained from using this technology.

Predictive Coding can also be used in any large-scale document review exercise, including arbitrations and internal and regulatory investigations. It has been well publicised that it was used as part of the SFO’s Rolls Royce investigation.

Pros and cons of Predictive Coding

Predictive Coding can offer the means to reduce dramatically the time and costs that businesses incur in disputes, while also having the potential to provide a better quality and more consistent output.

Disclosure has traditionally relied on teams of lawyers and paralegals manually reviewing and categorising each document. As a result, clients would often face significant time and technical costs. In Pyrrho Investments, Master Matthews referred to the fact that there were over 3 million documents in the case. A manual review of these would have cost several million pounds. Meanwhile, costs estimates for Predictive Coding in the case varied from £181,988 plus monthly hosting costs of £15,717 to £469,049 plus monthly hosting costs of £20,820. The costs for Predictive Coding were notably below those for a manual review. This should also be considered in the context that the percentage of documents reviewed which are relevant is usually well below 50% and perhaps as little as 10%. The percentage of documents which are relevant and not already known to the legal team is likely to be de minimis.

Predictive Coding enables one user to review, categorise and quality-check a small sample of documents, with the programme doing the heavy-lifting. It also reduces the scope for inconsistency in approach between different reviewers. Unlike human reviewers, the programme does not get tired, reducing the risk of error or inconsistency resulting from fatigue.

That being said, Predictive Coding is not a complete solution. The programme ultimately depends on input from a human expert that understands the factual and legal issues involved, in order to build its model. This issue has been highlighted in the recent judgment in Triumph Controls UK Ltd & Ors v Primus International Holding Co & Ors[ 2018 ] EWHC 176 (TCC), without the apparent oversight of a senior lawyer undertaking the role advocated in Pyrrho. This role is of a single senior lawyer who has mastered the issues in the case and reviews all of the samples in order to ensure that the machine learning is as accurate as can be. In Triumph, Mr Justice Coulson was concerned that the Predictive Coding may not have been ‘educated’ as well as it might have been. As with most things, Predictive Coding will rarely categorise an entire data set perfectly first time round. Therefore, lawyers and clients still need to review and check the programme’s output to ensure that it has correctly captured all the documents that it needs to, and, more importantly, that it has not captured any documents that it should not, particularly documents that contain either privileged or commercially sensitive information.