On November 26, 2012, the Department of Health and Human Services Office for Civil Rights (OCR) released long-overdue guidance on how covered entities subject to the Health Insurance Portability and Accountability Act (HIPAA) can de-identify protected health information (PHI) for research, comparative effectiveness studies, policy assessment, life sciences research, and other secondary uses. This guidance provides responses to industry questions on the methods through which the HIPAA de-identification standard should be implemented. While providing a more detailed discussion of methodological issues associated with de-identification, this guidance falls short of establishing an approved approach to de-identification. As a result, it raises additional questions about the potential risks to a covered entity associated with release of de-identified information.
The guidance on the de-identification requirements of the HIPAA Privacy Rule was mandated under the Health Information Technology for Economic and Clinical Health (HITECH) Act, given concerns that existing rules are no longer adequate as the health care industry moves rapidly toward digitized data. According to OCR, “the increasing adoption of health information technologies in the United States accelerates their potential to facilitate beneficial studies that combine large, complex data sets from multiple sources.”
The Privacy Rule provides the standard for de-identification of protected health information. Under this standard, health information is not individually identifiable if it does not identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual. It also establishes two de-identification methods: (a) a formal determination by a qualified expert applying statistical or scientific principles that the risk of identification by an individual is very small (Expert Determination Method); or (b) the removal of 18 specified individual identifiers in combination with the absence of actual knowledge that the remaining information could be used alone or in combination with other information to identify the individual (Safe Harbor Method). Successful application of either method yields de-identified data, the use and disclosure of which is not restricted by the Privacy Rule because it is not considered PHI.
Guidance on Satisfying the Expert Determination Method
The guidance addresses the following key issues with respect to the Expert Determination Method.
Definition of an Expert
OCR did not specifically define an “expert,” but it noted that from an enforcement perspective, it would review the academic or other training of the expert (in the statistical, mathematical, or other scientific domains), as well as the relevant actual experience of the expert using health information de-identification methodologies. This suggests that covered entities need to consider the adequacy of the qualifications of an individual who certifies that a data file is de-identified. If a covered entity does not employ such an expert, it may need to obtain appropriate consultant services.
Acceptable Method for Determining Identification Risk
- While OCR did not establish any particular process for an expert to use to reach a determination that the risk of identification is very small, it described a work flow and the factors used by experts in evaluating identification risk with respect to the degree to which a data set can be “linked” to a data source that reveals the identity of the corresponding individuals. See Table 2. In particular, OCR notes that gender, zip code, and birth date are three data elements that significantly increase the risk of identification, due to the availability of such information in records in the public domain to which health information could be linked. Therefore, de-identifying a data set will necessarily involve removing this combination of identifiers.
- OCR highlighted the fact that because technology, social conditions, and the availability of information changes over time, certain de-identification experts use the “limited certification” approach and set expiration dates, upon which the methodology requires re-assessment for future releases. While recognizing that a de-identification method may become insufficient over time and that a new method may be required for future releases of de-identified PHI, OCR does not address the status of the data sets already released under a “limited certification.” If such a methodology is found to be insufficient to satisfy the de-identification standard upon re-evaluation, the data sets released previously may have a greater risk of identification than they did at the time of release. It is not clear what exposure this creates for a covered entity.
Approaches to Mitigate the Risk of Identification of an Individual
- The guidance also acknowledges that otherwise identifiable data sets can be de-identified through alteration of the data, such as by the suppression of certain values within a record or an entire feature, generalization of certain features, and other risk-mitigation techniques. This is important for secondary uses of data because techniques such as suppression or generalization enable the release of more detailed data files.
Use of a Data-Sharing Agreement for De-identified Data
- In this context, OCR also raises the possibility that a covered entity may require the recipient of de-identified information to enter into a data-use agreement to access files with known disclosure risk. This statement may suggest that entities give further consideration to the long-term risks associated with release of de-identified information and the need for data-use agreements or for indemnification for inadequate de-identification if a business associate agreement is used.
Guidance on Satisfying the Safe Harbor Method
The Safe Harbor Method provides that de-identification may be achieved by removing from the health information 18 specific identifiers of the individual to whom the PHI relates, and the individual’s relatives, employers, or household members, in combination with the absence of actual knowledge by the covered entity that the remaining information could be used alone or in combination with other information to identify the individual. OCR provides explanation on the use of ZIP codes in de-identified information, the use of derivatives of listed identifiers, and the permitted use of dates and names. In particular, explanations provided in the guidance regarding what constitutes “any other unique identifying number, characteristic or code,” and “actual knowledge” under the Safe Harbor Method may help covered entities interpret those provisions of the Privacy Rule with greater confidence.
What Constitutes “Any Other Unique Identifying Number, Characteristic or Code”
The OCR guidance offered an explanation of what is included in the catch-all category of “any other unique identifying number, characteristic or code.” This category refers to any unique features that are not explicitly enumerated in the Safe Harbor list of specified identifiers but that could be used to identify a particular individual. Examples of such identifiers provided by OCR include clinical trial record numbers (identifying number), and the occupation of a patient, if it was listed as “current President of State University,” or similarly unique description (identifying characteristic). Similarly, a code derived from a secure hash function without a secret key, and a unique barcode embedded into patient records or their medications (identifying code) would be considered identifying elements. However, codes or other means of record identification assigned by the covered entity are not considered direct identifiers that must be removed under the Safe Harbor Method if: (a) the code or other means of record identification is not derived from or related to information about the individual and is not otherwise capable of being translated so as to identify the individual; and (b) the covered entity does not use or disclose the code or other means of record identification for any other purpose, and does not disclose the mechanism or secret key that would permit re-identification.
The guidance also emphasizes that to satisfy the Safe Harbor, a covered entity must not have “actual knowledge,” i.e., clear and direct knowledge, that the remaining information could be used either alone or in combination with other information to identify an individual who is the subject of the information.
To illustrate this principle, OCR includes an extensive explanation and provides four examples of when a covered entity would fail to meet the “actual knowledge” provision by removing the enumerated identifiers because the risk of identification is of a nature and degree that the covered entity must have concluded that the information could be used to identify individual patients.
- First, if a covered entity was aware that the occupation of a patient was listed with sufficient specificity so that, in combination with additional identifying data like age or state of residence, it could be used to identify the patient, the Safe Harbor de-identification standard would not be met. An example of such an occupational listing would be “former President of the State University.”
- Second, if a covered entity was aware that the anticipated recipient of the de-identified information had a family member in the data, and the covered entity was aware that the data would provide sufficient context for the recipient to recognize the relative, the data would not satisfy the Safe Harbor de-identification standard. A covered entity might be aware that the data would provide sufficient context if it detailed a complicated set of procedures that might permit the recipient to understand that the data pertained to his or her relative’s case.
- Third, publicized clinical events, such as a patient who gave birth to an unusually large number of children at the same time, may facilitate identification in a clear and direct manner.
- Fourth, direct knowledge that an anticipated recipient has a readily available table, algorithm, or other mechanism that can be used to identify the information or determine a patient’s identity would constitute actual knowledge.
The guidance provides that a covered entity’s mere knowledge of methods, such as statistical methods to identify remaining information or to use de-identified information alone or in combination with other information to identify an individual, by itself, does not mean that the covered entity has “actual knowledge” that the methods would be used with the data it is disclosing. Covered entities are not expected to presume that all potential recipients of de-identified data have the capacity to use such methods.
This OCR guidance provides important industry guidance, but allows considerable flexibility in ways that covered entities may de-identify PHI. This puts the onus on covered entities to carefully consider the risks associated with various methods of de-identification of PHI. In consideration of these issues, covered entities may want to include provisions in their agreements with business associates that address the responsibility for adequately de-identifying data. Further, covered entities may want to evaluate the potential use of data use agreements for their organization as a means to provide some control over the secondary uses of de-identified data.