This installment of The eData Guide to GDPR analyzes what “personal data” means under the General Data Protection Regulation.

The protection of personal data is the foundational rationale for the General Data Protection Regulation (GDPR). Thus, the first step in complying with the regulation is to understand what is meant by the term “personal data.” According to the definitions in Article 4, “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’).”[1] Data that identifies a natural person includes “a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”[2] A natural person becomes identifiable when disparate data sources can be cross-referenced or pieced together to reveal the identity of the person[3]. For example, where information identifying a specific individual is redacted or erased from a data set, if other data could be used to reveal that person, all of the data would be considered personal data and subject to the regulation.[4] In the context of a data set relating to a patient in a clinical study, if the redacted data can be linked to an unrelated database identifying individuals in specific locations or of specific ages or occupations with certain ailments, then all of that data can be considered personal data when taken as a whole.

There are certain obvious applications of the regulation. Payroll or other HR systems, as well as customer databases that include the names and addresses of natural persons, contain information that is personal data. However, processed information need not include commonly known identifiers to be considered personal data. It is important to note that information is personal data if it could be used to identify a data subject “directly or indirectly.”[5] This means that information may be personal data if it could be used in combination with other information to identify an individual natural person. As another example of data that renders a natural person identifiable, a list of transactions conducted at a given place or time would be personal data if it could be used in connection with other information to link it to an actual living person. Another example would be IP addresses or other “online identifiers.” These are described in Recital 30, which specifically contemplates that such information “may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.”[6] IP addresses and other “online identifiers” thus could be personal data, even if they are the only pieces of information being processed.

Another part of the definition that complicates analysis is that the information must “relate to” an identifiable data subject. It is possible that a natural person may be identifiable from a dataset, but that the data may not relate to him/her. In an opinion published in 2007[7], the Article 29 working group used the example of meeting minutes, where the presence and statements of meeting attendee A would not be personal data of meeting attendee B and therefore attendee B would have no rights to attendee A’s information in that same document, because it doesn’t “relate to” attendee B. A data processor or controller must consider the circumstances under which the data is being processed, including the reason for the processing or data collection, in order to determine if information relates to an identifiable data subject. Since this analysis is tricky, and since it is possible that the data may be used to infer information about the natural person that is not the data subject using other information, data processors and controllers should err on the side of caution and treat most natural persons as data subjects.

Data that has been completely anonymized is not personal data. Because data may be used “indirectly” to identify a data subject, however, data controllers should use caution in relying on pseudonymization and consider whether other data sources could be referenced to reveal the identity of data subjects. While strongly encouraged as a risk mitigation technique by the regulation,[8] the ability to indirectly identify a data subject with use of a “key” or other cross-reference tool makes pseudonymized data, personal data[9]. Because the substitution of identifying information is not enough to prevent re-identification of a data subject, there is likely enough information available to a knowledgeable user, such as a key file, to allow them to re-identify the natural persons.

Genetic data and biometric data which are designed to gather unique identifying data about a data subject are likewise considered sensitive personal data. Genetic data is “personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question.”[10] Biometric data is defined as “personal data resulting from specific technical processing relating to the physical, physiological or behavioral characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data.”[11] This information is considered sensitive personal data requiring special care, along with personal data revealing health, race/ethnic origin, sex life or sexual orientation, religious and political beliefs, or trade union membership.[12] Sensitive personal data will be the subject of a future paper.

Biometric data is data collected and processed for the express purpose of identifying a natural person, as such it is personal data by default. The classic and common examples of this type of data, and mentioned in the definition, are facial photographs and fingerprints, but it also includes less well known techniques like iris scans. It should be noted, however, that the processing of photographs is not necessarily considered processing of personal data in itself. Photographs are considered personal data “when processed through a specific technical means allowing the unique identification or authentication of a natural person,”[13] such as a security system. Less well defined, however, is “behavioral characteristics.” Behavioral biometrics may include such things as voice and signature recognition, gait analysis, or even the measurement/characterization of the way someone uses a computer or other device. It is unclear, at this time, how regulatory authorities will interpret this part of the definition and what will be included. If your company uses such a system to help identify people, to detect fraud for example, you should consider that such a system is processing biometric data pursuant to the GDPR.

Genetic data, while not necessarily collected for the express purpose of identifying a natural person, is also personal data by default, because of its very nature. A person’s genetic information can be used to uniquely identify them, the usage of DNA to identify or exonerate those accused of crimes is well-known. Moreover, the content of genetic information may be used to derive additional data about a uniquely identifiable natural person, all of which may be considered sensitive personal information. A person’s genetic information can be used to determine ethnicity/race, current or future health, and potentially sexual orientation.

In summary, personal data is information that

  • is related to (meaning the data is about) a natural person;
  • identifies the natural person; or
  • makes the natural person identifiable with the use of additional information; or
  • is genetic or biometric data.

Since information may be personal data even if the person may be identified indirectly, data controllers and processers should carefully consider what other sources of information to which a third party may have access (or even what reasonable inferences may be made about the data) in order to determine if the data should be considered personal data. If there is doubt, the information should be treated as personal data. Genetic and biometric data are unique types of personal data that should be treated with the utmost care since they are considered sensitive personal data by default.