We had another jury trial and thus were offline for a few weeks but it didn't take long to dig up something I hope you'll find of interest. I ran across it in a recent opinion by the district court in the In Re Fosamax Product Liability Litigation. What is it? It's a duty to actively mine the FDA's data for a signal, a hint, that your approved pharmaceutical product might be associated with an adverse event.

As an initial matter I almost didn't get to the interesting part of In Re Fosamax Product Liability Litigation thanks to this eye-roller: "In applying the nine Bradford Hill factors, he [Dr. Cornell] reviewed Plaintiff's medical records from 1996 to present, the office notes and depositions of her treating physicians, and 'past and current medical literature on the topics of osteopenia, osteoporosis and their prevention and treatment with bisphosphonate drugs including alendronate'". Which was followed by: "The methodology Dr. Cornell used is sufficiently reliable because the Bradford Hill criteria are 'broadly accepted' in the scientific community 'for evaluating causation, and 'are so well established in epidemiological research". This business of giving expert witnesses a pass for doing nothing more than invoking the great epidemiologist's name and saying that their method consisted of peering at the evidence through the supposed lens of Hill's "criteria" is utterly appalling but it would take several paragraphs to explain and we've done it before so I'll leave it at that.

The portion worth pondering is found further down in the discussion of defendant's motion to exclude a different expert witness, Dr. Madigan. Madigan is a statistician from Columbia with an impressive resume. He was tasked with assessing "whether a signal of problematic oversuppression of bone turnover and associated [atypical femur fractures (AFF)] . . . existed for Fosamax, using industry standard pharmacovigilance techniques and data sources, and the adverse event terms selected by Merck to internally evaluate the same" and whether "the strength of that signal, if any, in comparison to the signal, if any, for such events in other products indicated for the prevention and treatment of osteoporosis". To identify and evaluate such a signal Madigan took several medical terms considered by the defendant to be possible indicators of "oversuppression of bone turnover" and/or AFF and, using a program called Qscan, ran them through FDA's Adverse Event Reporting System ("AERS") database looking for associations with Fosamax use.

The data revealed "the presence of a clear signal for oversuppression of bone turnover and associated atypical femur fracture events utilizing the terms selected by Merck for such analysis. By standard metrics of 'signal' detection, the signal is strong, consistent, and not ambiguous. Of perhaps greater concern, the signal was striking in comparison to that for other drugs indicated for the prevention and treatment of osteoporosis. As early as 2001-2002, the spontaneous report data for Fosamax provide signals for a number of indicators of suppression of bone turnover. For the comparator drugs, such signals either never appear or appear years later." As Qscan (or similar software) data mining is widely used (and often mandated in the pharmaceutical industry) and as peer-reviewed papers arising out of data generated by the software had been published the court concluded that "data mining in pharmacovigilance" is a reliable method. Fair enough.

The excitement comes in the next paragraph in which the court concludes that Madigan's testimony "fits", and so is relevant to, an issue in the case because it informs the question of whether the defendant should have warned of the later-perceived risk of AFF. The court thereby implicitly held, I think, that the defendant had a duty to mine the FDA's data as early as 2001-2002. In other words, the existence of powerful data mining tools capable of uncovering an early signal of a possible harm associated with defendant's product created a duty to use such a tool. Ultimately, that's a duty to discover any statistical association between your product and some harm in the FDA's (admittedly accessible) that might be causal and to thereafter warn about it; and it's a duty to mine not only your data but any data that might shed light on your product.

It's hard to know what to make of a duty to mine big data. Obviously it's a potential problem for defendants. "Should have known" is no longer what was reasonably knowable (by humans). Instead it's what was knowable given a duty to use "powerful data mining and signal detection capabilities" including "a powerful query-by-example module that allows users to mine and visualize data through inquiries utilizing multiple case data elements." We're already in the middle of something similar; a case in which the plaintiff is demanding to see all of the death certificates collected by our client's benefits department claiming that we had, or had assumed, a duty to look for mortality patterns among our workers that might suggest a work-related etiology - the idea being that the plaintiff's decedent had succumbed to one such workplace illness and we either knew or should have known from looking at the death certificates that danger had been lurking.

And of course there's the problem of experts coming in after the fact and running term after term after modified term in varying combinations until an association emerges. Working backwards they'll then construct a narrative about why the set of terms eventually founded to produce the association would have been the obvious choice at the time the data mining ought to have been done; and the failure to look for such an obvious potential association will thereafter be cast as willful ignorance.

Then there's the bane of defendants in all latent disease cases - the hindsight bias. It's the "knew it all along" fallacy that emerges when a jury is shown the picture on the box before they see the puzzle pieces inside. They can't thereafter imagine that the few anecdotes and case reports that constitute a handful of the pieces that make up the picture on the box could ever have suggested mere randomness. That means the ability to mine huge amounts of data will make signals easier to find while making it harder to mount a successful state-of-the-art defense any time Qscan can tease a signal from the data.

Data mining has led, and will lead to startling discoveries in the sciences. In the law it may well lead to startling liabilities - especially if defendants are made to pay for harms foreseeable only by the most powerful software available. Ponder that.