Judge Denise Cote in the Southern District of New York and nearly every major bank and uber-firm in New York have given us plenty of predictive coding reading in this massive litigation. Cliff’s notes version is predictive coding was blessed, there were loads of meet-and-confers, and court-ordered cooperation resulted in a predictive coding project that appears to have worked well.

JP Morgan wanted to use predictive coding to help make their document review in the litigation more efficient. FHFA wanted old school eyes-on review of everything. For the uninitiated, check out our predictive coding 101.

Judge Cote thought 

that predictive coding should be given careful consideration in a case like this, and I am absolutely happy to endorse the use of predictive coding and to require that it be used as part of the discovery tools available to the parties. But it seems to me that the reliability and utility of predictive coding depends upon the process that takes place in the initial phases in which there is a pool of materials identified to run tests against, and I think that some of the documents refer to this as the seed — S-E-E-D — set of documents, and then there are various rounds of further testing to make sure that the code becomes smart with respect to the issues in this case and is sufficiently focused on what needs to be defined as a responsive document. And for this entire process to work, I think it needs transparency and cooperation of counsel.

I think ultimately the use of predictive coding is a benefit to both the plaintiff and the defendants in this case. I think there’s every reason to believe that, if it’s done correctly, it may be more reliable — not just as reliable but more reliable than manual review, and certainly more cost effective — cost effective for the plaintiff and the defendants.

FHFA then wisely shifted its argument from “whether” to “how[,]” and search aficionados should read the whole order to see a judge with strong case management skills in action. The following excerpt will bring chills to the spine of lawyers who went to law school to escape mathematics:

What is the methodology for creating the seed set? How will that seed set be pulled together? What will be the number of documents in the seed set? Who will conduct the review of the seed set documents? Will it be senior attorneys or will it be junior attorneys? Whether the relevant determination is a binary determination, a yes or no for relevance, or if there’s a relevance score or scale in terms of 1 to 100. And the number of rounds, as your Honor noted, in terms of determining whether the system is well trained and stable.

The parties quibbled about the parameters disclosure and keyword filtering before machine training. The plaintiff’s experts got their way and the defendant was not allowed to limit the predictive coding search to documents that happened to hit on key words. Most experts agree that keyword filtering before predictive coding is not appropriate because keywords are notoriously inaccurate and routinely exclude relevant documents. With some judicial prodding, defense counsel agreed to disclose both relevant and irrelevant training documents to plaintiff’s counsel- with the exception of privileged documents. This means that plaintiff’s experts were able to verify the training process and be sure the defendant was not construing relevance too narrowly.

The next argument revolved around whether the defendant could run keyword searches on data set before using their predictive coding tool. Judge Cote asked counsel if they had actually discussed the issue, as opposed to swapping letters and memos. They had not, and Judge Cote ordered them to talk to each other and cooperate- a novel concept, also useful for resolving grade-school playground disputes. As it turns out form the second order of August 10, 2012, talking worked. The parties agreed that there would be no keyword filtering.

And they kept meeting. As counsel for JP Morgan depicted these meetings,

We meet every day with the plaintiff to have a status report, get input, and do the best we can to integrate that input. It isn’t always easy, not just to carry out those functions but to work with the plaintiff.

The suggestions we have had so far have been unworkable and by and large would have swamped the project from the outset and each day that a new suggestion gets made. But we do our best to explain that and keep moving forward.

. . .

We very much appreciate that your Honor has offered to make herself available, and we would not be surprised if we need to come to you with a dispute that hasn’t been resolved by moving forward or that seems sufficiently serious to put the project at risk. But that has not happened yet and we hope it will not.

Eventually all these issues appeared to be resolved by agreement, with the guiding hand of Judge Cote. Later on there were some complaints about missing documents and a missing custodian in the transcript here (dated February 14, 2014), but the issue appears to purely relate to the failure to include a particular custodian in the search. This had nothing to do with the success of the predictive coding process for the custodians involved in the search. Judge Cote noted no review is perfect, and did not seem inclined to allow a redo of the predictive coding process, stating that

[p]arties in litigation are required to be diligent and to act in good faith in producing documents in discovery. The production of documents in litigation such as this is a herculean undertaking, requiring an army of personnel and the production of an extraordinary volume of documents. Clients pay counsel vast sums of money in the course of this undertaking, both to produce documents and to review documents received from others. Despite the commitment of these resources, no one could or should expect perfection from this process. All that can be legitimately expected is a good faith, diligent commitment to produce all responsive documents uncovered when following the protocols to which the parties have agreed, or which a court has ordered.

Indeed, at the earliest stages of this discovery process, JPMorgan Chase was permitted, over the objection of FHFA, to produce its documents through the use of predictive coding. The literature that the Court reviewed at that time indicated that predictive coding had a better track record in the production of responsive documents than human review, but that both processes fell well short of identifying for production all of the documents the parties in litigation might wish to see.

The two main orders in this mega-case pertaining to predictive coding, are here (order dated August 6, 2012), and here (order dated August 10, 2012).