In a seminal opinion that will likely be cited by litigants for years, U.S. Magistrate Judge Andrew Peck of the Southern District of New York became the first federal court judge to formally consider and approve of the use of computer-assisted document review in electronic data discovery.

In Da Silva v. Publicis Groupe, five plaintiffs allege that the defendants—one of the world’s largest advertising companies and its American public relations subsidiary—have a policy of employing women only in entry-level positions and engage in systematic, company-wide gender discrimination against female public relations employees. Given the nature of those allegations, discovery was predictably broad and voluminous.

In order to manage costs, the defendants proposed a predictive coding plan using a random sample from the entire e-mail collection. Predictive coding is the use of computer algorithms to identify sets of documents based upon their potential responsiveness to discovery requests. It is not an automated process, as some believe, but rather requires attorneys to provide search criteria, review results for accuracy, and refine the criteria over several iterations of searching.

The defendants proposed that a random sample of 2,399 documents would be initially reviewed to determine whether the documents were responsive. This “seed set” of documents would then be used to “train” the predictive coding software. The defendants also agreed to provide that set of 2,399 documents to the plaintiffs for their review. After the computer was trained, the defendants would then use seven iterative rounds of review, in which they would review 500 documents per round, to determine whether the computer was continuing to return new relevant documents. After the seventh round, the defendants would review another random sample of 2,399 documents that the computer had determined were not responsive to ensure that no highly relevant documents were given that designation.

The presiding district judge referred the case to Judge Peck, who in the past has written and spoken on eDiscovery issues and, in particular, computer-assisted review. After several discovery conferences with the parties in which the use of predictive coding was discussed, Judge Peck approved its use, but on a larger and much more thorough scale than what the defendants proposed.

In particular, Judge Peck stated that, if after seven iterations of coding the computer was not stable or needed more iterations to be successful, he would order the parties to “do another round or two or five or 500 or whatever it takes to stabilize the system.”

The plaintiffs filed objections to the judge’s ruling, largely on the ground that the defendants’ predictive coding plan was not sufficiently transparent. The plaintiffs accused Judge Peck of “provid[ing] unlawful ‘cover,’” by effectively excusing defense counsel from certifying that their document production is complete and correct under Rule 26(g) of the Federal Rules of Civil Procedure, and of accepting predictive coding in violation of Rule 702 of the Federal Rules of Evidence and the Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals, Inc.

Judge Peck made quick work of these arguments, noting their inapplicability to the case. Rule 26(g), he noted, requires only that counsel certify that an initial disclosure be certified as complete and accurate at the time of signing, and not document productions, and Rule 702 and Daubert relate only to the court’s responsibility as gatekeeper to exclude unreliable expert testimony from being submitted to the jury at trial.

The plaintiffs further argued that there would be no way to determine whether the defendants’ predictive coding plan was reliable—an objection Judge Peck dismissed as premature at best. The judge stated that he would be closely supervising the coding process, and that the defendants’ plan requires them to disclose their coding of every e-mail in the seed set to the plaintiffs. The defendants would also make available to the plaintiffs all documents that were reviewed in the coding process, even those deemed irrelevant. On those grounds, Judge Peck deemed the coding plan reasonable, and therefore acceptable.

Notably, Judge Peck was critical of the use of keyword searches in eDiscovery. He described document review based on keywords as “over-inclusive,” “quite costly” and “not very effective,” and opined that computer-assisted review appears to be better than the available alternatives and should be used in cases where it is appropriate. He then listed the factors that militated in favor of his opinion that predictive coding was appropriate in this case, including (1) the parties’ agreement, (2) the vast amount of ESI to be reviewed (more than 3 million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by defendant.

Judge Peck concluded with several important lessons for the future:

First, a court will likely not be able to make decisions about when review and production can end until the computer has been trained and the resulting searches have been verified through quality control. Second, splitting discovery into stages, using the custodians who are most likely to have relevant information in the first phase and allowing the requesting party to seek more custodians or documents in a second phase is an efficient way to control costs. Third, in cases where one party has specific knowledge about the other party’s records (such as in an employment context), both parties benefit from what the judge termed “strategic proactive disclosure of information.” Fourth, the judge encouraged parties to include their eDiscovery vendors in conferences and hearings, since their ability to explain complex eDiscovery concepts may assist judges who are not as tech-savvy to understand the real issues at play.

The DaSilva case is enormously significant in multiple respects. For years, litigants have been wary of predictive coding due to the absence of any judicial ruling approving its use. The DaSilva case now provides litigants with necessary support to justify the use of predictive coding, which, as Judge Peck noted, can significantly reduce the costs of eDiscovery. The opinion also provides litigants with a detailed blueprint for how predictive coding can be used in document reviews, consistent with the Federal Rules.

Litigants, academics, and commentators may look back on the DaSilva case years from now and declare it to be the turning point in the long battle against escalating discovery costs.