Now is the time to reflect on what portion of these regulatory decisions should be science-based, what portion should be policy-based, and whether the nontransparent application of agency lore is appropriate at all.
Classifying a substance as hazardous for regulatory purposes and setting a toxicity value for use in assessing the potential human health risk in various regulatory settings raise questions “on the frontiers of science” and rely on a delicate balancing of the science with “policy judgments”1 and “default options or generic approaches.”2Such regulatory risk calculations can well “lead to risk estimates that, although plausible, are believed to be more likely to overestimate than to underestimate the risk to human health and the environment.”3 Increasingly, such regulatory risk decisions may involve unarticulated, nontransparent agency practices and preferences. Because all the draft bills reforming the Toxic Substance Control Act (TSCA) contemplateincreasing the number of substances whose risk will be assessed or reassessed, and other Environmental Protection Agency (EPA) and state regulatory programs commonly utilize EPA reference doses (RfDs) and reference concentrations (RfCs), now is the time to reflect on what portion of these regulatory decisions should be science-based, what portion should be policy-based, and whether the nontransparent application of agency lore is appropriate at all.
This article uses EPA’s development of an RfC for the noncancer effects from exposure to trichloroethylene (TCE) in air as a case in point.
In 2011, EPA’s Toxicological Review of TCE concluded that “an overall review of the weight of evidence in humans and experimental animals is suggestive of the potential for developmental toxicity with TCE exposure”4 (the cancer classification determination) and selected 2 µg/m3 as the residential inhalation RfC (the cancer risk quantification decision). The RfC level selected was based primarily on a 2003 drinking water ingestion study allegedly showing a statistically significant association between exposure and heart malformations in rat fetuses (the Johnson study) and a supporting study of immunotoxicity in mice, rather than on human epidemiological data.5Specifically, the Johnson study was used by EPA to quantify the RfC. The distinction between the classification and quantification determinations is important because there is little dispute that TCE exposures should be classified as having noncancer effects at some levels, but there is significant scientific debate concerning the concentration and the time period over which the concentration should be averaged for short-term exposure.
A reference dose can be derived for acute (≤24 hours), short-term (>24 hours up to 30 days), subchronic (>30 days up to 10 percent of lifetime), or chronic (>10 percent of lifetime) exposure. The TCE RfC in the EPA 2011 Toxicological Review was “derived for chronic exposure duration,” but “chronic” was not defined.6 However, EPA’s Regions 37 and 9,8 some states and, most recently, EPA’s Office of Toxic Substance9 use a “novel” 24-hour averaging period to determine if the average indoor concentration exceeds the 2 µg/m3 regulatory action level for residential buildings and the 8 µg/m3 level for commercial or industrial buildings and use those concentrations as the trigger to require remedial action. The 24-hour averaging approach is novel because such an approach has not previously been utilized in setting regulatory levels.
The TCE RfC now drives both the indoor air screening levels and the remediation action levels for TCE vapor intrusion from contaminated groundwater.10 Although further examination of the necessity for remedial action based on these criteria may be warranted, that is not the primary focus of this paper. It is rather the unusual level of scientific uncertainty and debate surrounding the TCE RfC level and the averaging time for exposure to TCE in air that is the issue presented herein.
Reliance on the Johnson Study to Derive the RfC
The Johnson study is a relatively unusual study because two of the four exposure levels reported in the 2003 study (for concentrations of 1.5 and 1,100 parts per million [milligrams per liter]) are data from a previously published study (Dawson 1993 study) from the same laboratory,11 but at a different time. While “meta-analysis [i.e., combining data from different studies] can be a valuable method for summarizing evidence,” it can also be subject to variable interpretations depending on how literature is selected and reviewed and data analyzed.”12 At least since Nathan Mantel’s seminal paper in 1959, “[w]here two sets of controls lead to substantially different results, a cautious and conservative interpretation is indicated.”13 Similarly, a European Union agency recently concluded that a “retrospective meta-analysis of two studies originally intended to stand on their own is not expected to add any useful information.”14
In the case of the Johnson study, at least, the original data in the Dawson 1993 study was published on its own and found no statistically significant increase in heart defects in rat pups. When the Johnson study pooled the Dawson 1993 study data with other later data from different studies, the conclusion was that there is a statistically significant increase in heart defects. In fact, the pooling of data was only discussed by the authors after comments were raised on the paper.15
Inconsistency with Other EPA Conclusions
EPA acknowledges a possible source of uncertainty is that “the research was conducted over a 6-year period [and] combined control data were used for comparison to treated groups.”16 However, that uncertainty is not quantified.
Other unresolved questions concerning the Johnson study remain, such as “the precise dates that each individual control animal was on study and the detailed results of analytical chemistry testing for dose concentration.”17 In fact, it is now known that the exposures “occurred in 1994, not 1995.”18 EPA also admits other “possible sources of uncertainty” for these studies included “possible imprecision of exposure characterization due to the use of tap water in the [Dawson 1993 study] and TCE intake values that were derived from water consumption measures of group housed animals.”19 Further, EPA’s 2014 proposed explanation of how TCE might cause fetal heart defects was not in the 2011 EPA Toxicological Review, has not been peer reviewed, and disagrees with some of the prior independent expert reviews of modes of action for TCE and similar chemicals (see discussion below).
The dose-response curve is atypical (i.e., there are effects at the lowest and highest exposure levels, but not the intermediate exposure levels).
On the face of EPA’s own 2014 review, there are six published studies reporting “the results of oral administration of TCE to rodents during fetal developmental,” but only the Johnson study reports a statistically significant increase in fetal malformations.20The Dawson 1993 study (which includes the data later used in the Johnson study) found no statistically significant increase. The Fisher 2001 rat study (whose authors include the lead author of the Johnson study) failed to detect any effects, as do the two other rat studies and one mouse study.21 In addition, “none” of the five separate inhalation studies of TCE exposure to rats reported cardiac effects in fetuses (including the Carney study, a well-done study performed in 2006, but not reviewed in the 2006 National Academy of Sciences [NAS] report).22 No laboratory has replicated the Johnson study results.
Inconsistency with Other Independent Reviews
An NAS committee in 2009, citing an in-depth toxicological review and the Carney study, concluded that “there is no indication of a causal link between TCE and cardiovascular defects at environmentally relevant concentrations.”23 Additionally, California regulators in 2009 refused to use the data in the Johnson study to calculate a public-health protective concentration “since a meaningful or interpretable dose-response relationship was not observed,” the results were “not consistent with earlier developmental and reproductive toxicological studies done outside this lab in mice, rats, and rabbits,” and “other studies did not find adverse effects on fertility or embryonic development, aside from those associated with maternal toxicity.”24
Similarly, in 2013, the Agency for Toxic Substances and Disease Registry (ATSDR) compared the “estimated 24-hour average concentrations” of TCE in indoor air to the TCE inhalation risk-based “human equivalent concentration (HEC99) for inhalation” of 21 µg/m3 to “evaluate the potential for adverse health effects,” rather than 2 µg/m3.25 In 2014, the ATSDR draft review of TCE stated that “[p]regnant laboratory animals have been exposed to trichloroethylene vapors, but no conclusive studies have been encountered that clearly indicate teratogenic effects.”26
The 2006 NAS committee recommended that a laboratory replicate the Johnson study results, but, despite attempts to do so (including by one group that included the lead author of the Johnson study), the results have not been replicated. Despite this, EPA has rejected an offer by industry groups to perform a replication of the study that would presumably address the scientific concerns raised about this study.27
In short, a number of regulators and independent expert reviewers have addressed TCE exposure levels, but have not relied on the Johnson study, apparently because of doubts as to its reliability.
Reliance on the Johnson Study and EPA’s 1991 Guidance to Derive the Averaging Time
EPA (Region 9 and the TSCA Office) state that “exposures over a period as limited as 24 hours may be of concern for some developmental toxicants” based on a single sentence in a 1991 EPA guidance.28 Typically, 24-hour reference doses are not utilized as regulatory action levels.29 The use of a TCE concentration averaged over a lifetime, a decade, a year, nine months or 24 hours significantly impacts the stringency of the regulatory action level. Generally, given the typical daily and seasonal variation in indoor air concentrations, using a 24-hour average will trigger regulatory action much more frequently and is likely to be interpreted by residents as indicating that harm has occurred. However, the use of a 24-hour average indoor air concentration to trigger immediate regulatory action30 is unsupported by independent scientific analysis and even EPA’s own determinations.
The Johnson study and other studies were all long-term exposure studies, not 24-hour exposure studies. The EPA 2011 TCE Toxicological Review, the Science Advisory Board and the NAS 2006 review, and other independent scientific evaluations, do not discuss, select or provide explicit scientific support concerning the use of a 24-hour average TCE indoor air concentration to trigger regulatory action. As far as the author is aware, neither EPA nor any other governmental authority has previously applied the 1991 EPA Developmental Guidance in this manner to other chemicals. This 1991 guideline is archived, i.e., not on an active website. It contains one sentence about using a single exposure when no other data is available. As far as the author has been able to determine, it has never previously been used in this context. In any case, nothing in that 1991 guidance defines the length of the single exposure, which therefore could be nine months or longer, not just 24 hours.
As noted by the ATSDR in assessing exposure to TCE in indoor air, “there are no suitable comparison values for TCE that represent the timeframe in which the . . . residents were exposed” (which was one year at this site) since “EPA’s reference dose and reference concentration are both intended for comparison to chronic or longer duration exposure scenarios.”31 Although EPA headquarters’ August 2014 guidance on TCE vapor intrusion states that “a single exposure of any of several developmental stages may be sufficient to produce an adverse developmental effect,”32it too fails to explain the reason that the term “single exposure” must be 24 hours. In fact, EPA stated as recently as 2015 that “the [TCE] RfC for a single exposure has not been determined yet by EPA.”33 Neither EPA nor an independent expert review group has defined or proposed that a single exposure is 24 hours, 21 days (what some toxicologists have referred to as the “three critical weeks during gestation”) or the entire nine-month gestation period for use in determining when the TCE indoor air levels trigger action. Thus, the July 2014 EPA guidance does not support the EPA Region 9 guidance that adopts a 2 µg/m3 trigger based on a 24-hour average concentration. In fact, even the EPA Region 9 guidance that set the 24-hour averaging approach acknowledges that “[s]cientific information on the exact critical period of exposure for this health impact is not currently available.”34
Thus, no science seems to be cited to support the use of a 24-hour indoor air concentration for developmental effects.
In summary, scientific reviews by other agencies, independent experts, EPA’s weight-of-the-evidence conclusions from reviewing the same data in other toxicological reviews, and even EPA’s own update of the TCE toxicological review indicate that many scientists (apparently a majority) have recommended that regulators recognize the flaws in the Johnson study and causal analysis. All of the non-EPA independent reviews cited above concluded that the Johnson study and its data should not be used for regulatory determinations and have identified important and relevant unresolved scientific issues in the study preventing its use because of its unreliability.
The Gordian Knot: When Does the Scientific Uncertainty Exceed What Is Allowable
The number of contradictions concerning the setting of the TCE RfC based on developmental toxicity, as documented by independent reliable sources, suggests at a minimum caution, if not skepticism. Thus, EPA’s decisions concerning TCE raise legal and policy issues that warrant careful consideration and debate. One issue is whether EPA’s TCE decision exhibits a degree of scientific uncertainty exceeding that which is allowable based on general principles of administrative law and historic EPA practice. Further what tools can EPA (or, if necessary, a court) use to determine when the scientific uncertainty is “too great”?
Historically, courts have held that agencies “may ‘err’ on the side of overprotection”35and are “not required to support . . . [their] findings with anything approaching scientific certainty.”36 However, judicial deference to EPA’s judgment has never been unbounded. The TCE RfC bears all of the indicia necessitating a judicial hard look into EPA’s decision-making.
Courts have refused to defer to an agency’s decision (a) “when the agency’s interpretation conflicts with a prior interpretation,” particularly where the “potential for unfair surprise is acute”; (b) when there is reason to suspect that the agency’s interpretation “does not reflect the agency’s fair and considered judgment on the matter in question”; or (c) when the agency’s interpretation is “plainly erroneous or inconsistent with the regulation.”37 Additionally, courts have refused to uphold agency decisions when they ignore the findings and recommendations of independent, expert bodies.38 Each of these factors applies in this case.
Here, EPA, in its 2011 TCE Toxicological Review, changed course concerning how much weight to give to the Johnson study and the Dawson 1993 study data contained in the Johnson study. Similarly, EPA has changed its view concerning the method by which TCE might cause fetal heart deformation (See 2002 1,1-DCE Toxicological Review). In fact, the nearly simultaneously released toxicology reviews of trichloroacetic acid and TCE reached opposite conclusions. In essence, it appears that there is a disagreement among EPA scientists on how to weigh the evidence (without explicit acknowledgement or explanation of the change or disagreement). Thus, these decisions should be subject to heightened judicial scrutiny.
Certainly, the findings and recommendations concerning the Johnson study by the 2009 NAS committee, the state of California, the 2006 NAS committee and the ATSDR impair the normal justification for deference to EPA.
The EPA decision-making in this matter should be viewed in light of the recent finding of yet another independent NAS committee (whose members collectively had conducted many EPA toxicological reviews), which concluded that there have been persistent “problems encountered with . . . [the EPA’s risk] assessments over the years” that have been “identified by multiple groups.”39 In fact, sometimes EPA “conclusions appear to be based on a subjective view of the overall data, and the absence of a causal framework.”40 This suggests that the EPA decision-making on TCE may not reflect a fair and considered scientific judgment of many EPA scientists.
EPA headquarters has not even issued a proposal concerning single exposure to TCE on the use of a 24-hour average. EPA Region 9’s choice of a 24-hour averaging time presents the potential for such unfair surprise. Furthermore, EPA cannot issue a “binding” directive (e.g., a requirement to use the TCE RfC and the 24-hour averaging period) when the agency has not sought public comment or explained how such new practice is consistent with existing law.41
Furthermore, courts have rejected EPA’s modeling and other calculations when they have not conformed to observed data.42 In the case of TCE’s RfC (as well as with other chemicals), EPA has not compared the projected rate of fetal malformations to the observed frequency of fetal deformations (from all causes).43 That is, EPA did not even attempt to verify the accuracy and precision of its toxicological predictions.
Even if EPA’s substantive TCE decision were not subject to probing judicial scrutiny, EPA may not ignore a study’s weaknesses, focus exclusively (or primarily) on positive findings, decide that a lack of mechanism or dose response can be ignored, or, in effect, adopt a zero-risk goal (a so-called “better to be safe than sorry” policy), particularly if the decision appears to be driven by an unofficial practice neither publicly vetted nor grounded in the statute being administered.44
The weight-of-the-evidence analysis for TCE is at best poorly and inconsistently articulated, focused only on statistical associations (not causation), and it fails to address the highly unusual and persuasive number of non-EPA scientific judgments to the contrary. An “association is not causation.”45 The extensive literature on causation requires “systematic identification of relevant evidence, criteria for evaluating the strength of evidence, and language for describing the strength of evidence of causation.46 EPA official guidelines acknowledge that “[d]etermining whether an observed association (risk) is causal rather than spurious involves consideration of a number of factors” (citing strength, experiment, consistency, plausibility and coherence — often referred to as the “Hill criteria”).47 EPA’s toxicological review of TCE fails to articulate a clear rationale for its decisions in weighing the evidence. Further, the TCE toxicological review and its 2014 update seem to reflect a series of science policy decisions that are simply not explicitly explained in these reviews.
The purpose of this article is not to seek to delay regulatory action, but to communicate fairly and neutrally the degree of uncertainties and suggest methods of mitigating problems presented by these uncertainties. In the long term, such analysis will streamline risk assessments and ultimately the regulatory actions addressing those risks.
First, EPA should replicate the Johnson study with the modification that the replicate study should follow good laboratory practice guidelines. It is troubling that EPA has shied away from such a reasonable proposal.
Second, more generally, the key issues in the TCE noncancer risk assessment should be reviewed by an NAS committee.
Third, EPA should initiate a transparent process to develop science-based guidance concerning the period of time over which indoor air levels should be averaged to determine if the RfC is exceeded. The shortcomings individually and cumulatively suggest, at a minimum, the need for caution before relying on the Johnson study to make novel toxicological decisions because of this higher-than-normal degree of scientific uncertainty.
Fourth, more generally, there is a need for regulators to articulate which conclusions are based on science and which are based primarily on science policy, particularly when the scientific weight-of-the-evidence analysis may conclude that there is insufficient evidence of causation. There should be no place in regulatory risk decision-making for reliance on agency lore. As recommended by the 2011 NAS report on formaldehyde, the decision-making criteria and preferences used in judging the quality of studies and in weighing epidemiological data versus animal data versus mechanistic data must be clearer. In particular, there should be criteria for determining when suggestive data is simply not sufficient to rely upon in setting numerical limits for noncancer effects such as fetal deformation, i.e., when making regulatory decisions based on limited data of poor quality is more detrimental than assuming the absence of that data. Current EPA statements seem to concentrate more on when one or more of the generally accepted Hill criteria for determining causation (discussed above) can be ignored than when the data is simply not sufficient.
The starting points for this reform should be the recommendations from the 2011 formaldehyde NAS report and EPA’s response to it. But the process also needs reform. When quality or novel policy issues arise in the course of a review, EPA should not wait (often years) to seek expert input, particularly where, as here, other EPA personnel performing similar reviews and governmental bodies’ or independent experts’ reports raise quality or interpretative issues with the same or similar data. Science should not be a game of hide and seek. In the appropriate situations, expedited studies should be performed to promote scientific certainty on key issues, particularly where industry is willing to fund a joint government-industry study.
Fifth, when novel policy issues are identified, early public and expert review should be sought in parallel with the completion of the review. In the case of TCE, an independent group should be asked to advise EPA concerning health-protective, yet workable, policy options that may be utilized as the scientific certainty in risk assessment decision-making varies along the spectrum from no reasonable evidence to virtual certainty. It is simply misleading to group chemicals that may have numerous epidemiological studies demonstrating substantial increases in relative risks with chemicals (such TCE) where the evidence is, by any measure, weak and uncertain. More importantly, grouping chemicals, the toxicity of which have significantly different degrees of scientific support, is likely to misallocate finite regulatory and societal resources.
For example, the level of evidence might trigger more intensive research to fill data gaps on an expeditious timeframe (as recommended nine years ago by the 2006 NAS committee). Similarly, rather than propose a single concentration for the RfC, the extraordinary level of uncertainty may justify the use of a range of potential RfCs (as has been the practice in regulating carcinogens).
Sixth, the EPA peer-review process would benefit from a hard look designed to increase the selection of well-balanced peer reviewers who have actual experience in the key technical issues and the involvement of all stakeholders (governmental, environmental groups and industry) earlier in the process (including providing information and queries to EPA and peer reviewers).48
Each of these reforms could be developed after stakeholder involvement. However, to date, EPA has evinced little inclination to take the initiative, and thus it is incumbent on independent expert bodies to lead the way.