The California Consumer Privacy Act ("CCPA") was enacted in early 2018 as a political compromise to stave off a poorly drafted, and plaintiff’s friendly ballot initiative. Although the CCPA is scheduled to go into force in early 2020, there is a great deal of confusion regarding the requirements of the CCPA, including the degree to which it aligns with other privacy regulations such as the European General Data Protection Regulation (“GDPR”).
To help address that confusion, BCLP published the California Consumer Privacy Act Practical Guide, and is publishing a multi-part series that discusses the questions most frequently asked by clients concerning the CCPA.
Q. Does the CCPA apply to information that has been de-identified?
The CCPA governs the collection, use, and disclosure of the “personal information” of California residents. The term “personal information” is defined broadly to include any information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.”1 The Act defines the term “de-identified” as a near inverse to personal information – i.e., information that “cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer.”2 The one disparity between the definition of “personal information” and the definition of “de-identification” is that the former purports to apply to information relating to a “household” whereas the latter refers only to “consumers.” Given the history of the CCPA, this likely was a drafting oversight, although it remains to be seen whether courts will attempt to ascribe meaning to it.3
While de-identified information is, by definition, not “personal information” and, therefore, not subject to the CCPA, there is a great deal of uncertainty as to what level of obfuscation is required in order for information to not “reasonably” identify an individual. That confusion is due, in part, to the fact that de-identification is not a single technique, but rather a collection of approaches, tools, and algorithms that can be applied to different kinds of data with differing levels of effectiveness. In 2010, the National Institute of Standards and Technology (NIST) published the Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) that provides a set of instructions and de-identification techniques for federal agencies, which can also be used by non-governmental organizations on a voluntary basis. The guide defines “de-identified information” as “records that have had enough PII removed or obscured, also referred to as masked or obfuscated, such that the remaining information does not identify an individual and there is no reasonable basis to believe that the information can be used to identify an individual.”4 NIST identified the following five techniques that can be used to de-identify records of information:
- Suppression: The personal identifiers can be suppressed, removed, or replaced with completely random values.
- Averaging: The personal identifiers of a selected field of data can be replaced with the average value for the entire group of data.
- Generalization: The personal identifiers can be reported as being within a given range or as a member of a set (i.e., names can be replaced with “PERSON NAME”).
- Perturbation: The personal identifiers can be exchanged with other information within a defined level of variation (i.e., DOB may be randomly adjusted -5 or +5 years).
- Swapping: The personal identifiers can be replaced between records (i.e., swapping the ZIP codes of two unrelated records).
The European Union’ Article 29 Working Party identified the following additional de-identification techniques:5
- Noise Addition: The personal identifiers are expressed imprecisely (i.e., weight is expressed inaccurately +/- 10 lb).
- Differential Privacy: The personal identifiers of one data set are compared against an anonymized data set held by a third party with instructions of the noise function and acceptable amount of data leakage.
- L-Diversity: The personal identifiers are first generalized, then each attributed within an equivalence class is made to occur at least “l” times. (i.e., properties are assigned to personal identifiers, and each property is made to occur with a dataset, or partition, a minimum number of times).
- Pseudonymization – Hash Functions: The personal identifiers of any size are replaced with artificial codes of a fixed size (i.e., Paris is replaced with “01,” London is replaced with “02,” and Rome is replaced with “03”).
- Pseudonymization – Tokenization: The personal identifiers are replaced with a non-sensitive identifier that traces back to the original data, but are not mathematically derived from the original data (i.e., a credit card number is exchanged in a token vault with a randomly generated token “958392038”).
The uncertainty as to what counts as “de-identified” data is further complicated by the fact that different regulatory agencies, and legal systems, have historically applied different standards when assessing whether information is, or is not, capable of being re-associated to an individual. For example, the Federal Trade Commission indicated in its 2012 report Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers that the FTC’s privacy framework only applies to data that is “reasonably linkable” to a consumer.6 The report explains that “data is not ‘reasonably linkable’ to the extent that a company: (1) takes reasonable measures to ensure that the data is de-identified; (2) publicly commits not to try to re-identify the data; and (3) contractually prohibits downstream recipients from trying to re-identify the data.”7 With respect to the first prong of the test, the FTC clarified that this “means that a company must achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device.”8 Thus, the FTC recognizes that while it may not be possible to remove the disclosure risk completely, de-identification is considered successful when there is a reasonable basis to believe that the remaining information in a particular record cannot be used to identify an individual. The FCC adopted in its Broadband Privacy Order the FTC’s three-part de-identification test.9
The CCPA does not directly adopt the FTC’s recommended framework of requiring public commitments to re-identification, or explicitly mandating that contracts prohibit re-identification attempts, but does require that a company that believes that data is de-identified take the following four steps to proactively prevent re-identification:
- Implement technical safeguard that prohibit re-identification.
- Implement business process that specifically prohibit re-identification
- Implement business processes that prevent inadvertent release of de-identified information, and
- Make no attempt to re-identify the information.10