On 16 February 2016, the use of predictive coding software in the e-disclosure process was explicitly approved for the first time by an English court. This endorsement is good news for parties involved in document-heavy cases - predictive coding could lead to significant costs savings by reducing the need for armies of paralegals to conduct manual searches.

Standard disclosure

In Pyrrho, the parties had been ordered to give standard disclosure, i.e. to disclose those documents supporting or adversely affecting the case of any of the parties to the dispute. Parties are required to conduct a "reasonable search" (bearing in mind the circumstances of the case and the overriding objective to deal with cases justly and at proportionate cost) for documents falling under these categories. Here the Claimants possessed three million potentially relevant electronic files.  In view of the costs of manually reviewing these documents, the parties agreed that predictive software should be employed in the Claimants' e-disclosure exercise but felt it necessary to seek formal approval from the court. 

What is predictive coding?

Predictive coding refers to document categorisation/selection for relevance undertaken by proprietary computer software rather than humans. The software analyses documents and "scores" them for relevance to the issues in the case. Crucially, the cost does not increase at the same rate as the number of documents to be reviewed increases – doubling the number of documents does not double the cost, as it would with human review. A senior lawyer familiar with the issues in the case trains the software by marking a representative sample of documents in the case relevant or irrelevant. This enables the software then to categorise each document in the set. The results of this exercise are then validated by further rounds of manual review, and the results fed back into the software. If the relevance of a document was incorrectly assessed at the first stage, all the documents depending on it are themselves then re-assessed. At the end of the exercise, the software sorts all of the remaining documents into "Relevant" /"Not Relevant". The "Relevant" documents are then reviewed by the legal team for significance to the case and privilege. 


In a document-heavy case, a party will engage a technology provider to process and host the documents on a database so it can manually review them.  These costs (which can be significant) will still be incurred where a party opts to use predictive coding software. But in our experience the software itself is inexpensive by comparison:  employing the predictive software to analyse the three million potentially relevant documents in Pyrrho might have added in the region of £30,000 in software cost.

For a proper comparison between the manual and predictive coding processes, one must also factor in the human costs of training the software and reviewing the documents classed "Relevant". But as the court observed, whilst the number of documents that have to be manually reviewed in a predictive coding process may be high in absolute number, it will still only be a small proportion of the total that would have been reviewed in a case such as Pyrrho. So the costs overall should be considerably lower.


The court cited a number of reasons for approving the use of predictive coding software. It held that:

  • experience in other jurisdictions had shown that the software can be useful in appropriate cases, and there was no evidence to show that its use leads to less accurate disclosure being given than manual review
  • in fact, there would be greater consistency in using the software to apply the approach of a single, senior lawyer towards the initial sample than in using large numbers of lower grade fee earners each seeking independently to apply the relevant criteria to individual documents
  • a manual review of the vast numbers of documents in Pyrrho would be unreasonable, at least where a suitable automated alternative exists at lower cost
  • there was nothing in the Civil Procedure Rules or Practice Directions prohibiting the use of such software (indeed PD 31B.25 envisages the use of keyword searches or other automated methods of searching if a full manual review would be unreasonable)
  • the estimated costs of using the software were proportionate since the value of the claims made in the litigation were in the tens of millions of pounds
  • the use of predictive coding software would therefore promote the overriding objective


The Pyrrho case was heard by consent and so the court did not have to consider contrary arguments which may yet appear in future cases. That is presumably why the court observed that approval for predictive coding software in future cases would depend on the particular circumstances obtaining in them. But this is a welcome step forward and we recommend that where parties feel that their case may be appropriate for predictive coding - having regard to quantum and the volume of potentially relevant documents - they seek to agree on its use at an early stage, subject to court or arbitral tribunal approval as required. In civil litigation, such negotiations would, in any event, be consistent with the obligation on parties under PD31B.8 and 9 to discuss the electronic disclosure process in advance of the first case management conference.