The recent judgment in the case of Pyrrho v MWB  EWHC 256 (Ch) has approved the use of 'predictive coding' in the disclosure process for the first time in the English courts. This is a significant development in demonstrating the willingness of the court to support innovative approaches to a process which can amount to a significant proportion of litigation costs and deter clients from proceeding with otherwise meritorious litigation. Predictive coding has been used widely in US litigation and was recently approved in the Irish courts in Irish Bank Resolution Corporation Ltd & ors v Quinn & ors  IEHC 175.
Conventional litigation support software is aimed at speeding up the process of lawyers and paralegals reviewing all potentially relevant documents. Techniques such as keyword searches try positively to identify potentially relevant documents. By contrast, predictive coding (or computer assisted review) is more focused on identifying and discarding a large corpus of irrelevant documents, enabling the reviewers to focus on a core set of documents most likely to be relevant.
Approaches to predictive coding vary in detail. In general terms a lawyer will train the predictive coding software on a sample set of documents. The software is then let loose on the rest of the documents to analyse and rank them by potential relevance. A typical process can be summarised as follows:
- Detailed protocols are discussed and, preferably, agreed between the parties. In Pyrrhothe protocols were agreed and not described in detail in the judgment. The protocols approved in the Irish Bank case are considered in more detail in that judgment. They cover technical matters such as the data set, batches and the control set as well as transparency issues such as an agreement to notify and consult with the other party in relation to any necessary derogation from the protocol. In Irish Bank some points of disagreement were decided by the court.
- A representative sample (in Pyrrho this was 1600-1800 documents) is reviewed by a single senior lawyer who has mastered the issues in the case to train the software. Based on the training the software then analyses the whole document set and ranks each document as to likely relevance.
- The results of the ranking exercise are validated through a number of quality assurance exercises which are based on statistical sampling, the sampling size being fixed in advance depending on what confidence level and margin of error have been decided on.
- The samples selected are then reviewed by a lawyer with the results being fed back into the system for further learning. This is repeated as many times as necessary to bring the number of software decisions overturned by lawyers to a level within agreed tolerances. In Pyrrho it was said that this would involve a review of some 8 to 12 batches of documents.
In cases involving vast numbers of electronic documents (the Pyrrho case itself concerns 3.1 million reviewable documents), this process can make disclosure more manageable, increase the accuracy of the exercise (when compared to the more traditional methods of document selection such as using key-word searches) and – perhaps most importantly – save significant cost and time.
Practitioners have previously been reluctant to use predictive coding software due to:
- having (perhaps misconceived) views that a large scale brute force human review or keyword-based volume reduction is more accurate and reliable than the use of software; and
- concerns that such a review may not be sufficiently defensible to discharge lawyers' duties and obligations in respect of disclosure under the Civil Procedure Rules.
The Pyrrho judgment has now approved the use of such software in principle which largely dispenses with the second concern. This approval is more likely to make law firms dip their toe in the water to see what predictive coding can do and assess its efficacy against the more old-fashioned human review.
As with any disclosure process, but particularly one involving the use of technology, the process of predictive coding takes planning and engagement at the start. It also requires care to ensure that the process is properly implemented and logged so that what was done and all decisions made can be explained and, if necessary, defended later.
Even less sophisticated methods have the potential to cause problems, as can be seen from the following cases:
- In Digicel (St Lucia) Ltd and others v Cable and Wireless Plc and others 2008 EWHC 2522 the defendants were ordered to restore and search certain back-up tapes and undertake more extensive keyword searches than had already been carried out at significant additional costs in the region of £2 million.
- In West African Gas Pipeline Company Ltd v Willbros Global Holdings Inc  EWHC 396 (TCC) a wasted costs order was awarded against the claimant due to a failure to "de-duplicate" documents properly and a failure to gather together a consistent and complete set of electronic data.
- In Smailes and another v McNally and another  EWHC 1755 (Ch) the claimants were criticised for disclosing very poor quality OCR documents which led the judge to conclude that a reasonable search had not been carried out as the mechanism devised had not worked correctly.
The Courts already expect practitioners to give careful consideration (even at the outset of proceedings) to the approach and method to be applied to the disclosure stage – particularly for heavyweight cases involving significant amounts of electronic documents. Given the Pyrrho judgment it can be anticipated that predictive coding will feature more strongly among the options that should be considered, at least for very large disclosure exercises. It is indeed encouraging and welcome that this decision raises the prospect of a more cost effective and practical approach to one of the most costly and time consuming aspects of litigation although the costs of the exercise may make this a more realistic option for very large disclosure exercises.