In this second article of our "Big Data & Issues & Opportunities" series (see our first article here), we focus on some of the privacy and data protection aspects in a big data context. Where relevant, illustrations from the transport sector will be provided.
The analysis of privacy and data protection aspects in a big data context can be relatively complex from a legal perspective. Indeed, certain principles and requirements can be difficult to fit with some of the main characteristics of big data analytics, as will be demonstrated in this article. In this respect, it is important to note that “the process of aggregation implies that data is often combined from many different sources and that it is used and/or shared by many actors and for a wide range of purposes.” This multitude of sources, actors and purposes cannot always be reconciled with the legal requirements related to data protection and security. Despite the intricacies of the legal analysis, it is still important to carefully examine how the legal requirements can be implemented in practice.
The legal assessment requires taking into consideration the newly adopted EU legal framework, and notably the new General Data Protection Regulation (hereinafter the "GDPR"), which became applicable on 25 May 2018, introducing a raft of changes to the existing data protection regime in the EU. While some of the data protection principles, obligations and rights pre-existed, some of them have been enhanced and others newly created by the GDPR.
In the remainder of this article, we will not delve into all rights and obligations included in the GDPR. We will however examine some of the core principles and concepts put forward by the GDPR that many actors active in the field of big data analytics at European level will be confronted with, and how these may be difficult to reconcile with disruptive technologies.
Privacy and Data Protection in a Big Data Context: Challenges & Opportunities
This section dedicated to the analysis of some of the relevant challenges and opportunities related to privacy and data protection intends to show some of the intricacies that some concepts, principles and obligations may cause in relation to a disruptive technology such as big data.
The main findings, categorised by different topics, may be summarised as follows:
The concepts of "personal data" and "processing"
The GDPR applies to the "processing" of "personal data". As these definitions and the interpretation thereof are very broad, numerous obligations under the GDPR will apply in many circumstances when performing big data analytics.
Moreover, in the context of big data, it cannot be excluded that the data analysis concerns "sensitive data" – the processing of which is restricted and prohibited in most cases – or that it will have a “transformational impact” on data. For instance, the processing of non-sensitive personal data could lead – through data mining, for instance – to the generation of data that reveals sensitive information about an individual.
The broad scope of application of the GDPR and the possible processing of sensitive data may require limiting certain processing activities or technical developments to tackle the stringent rules included in the GDPR.
|Illustration in the transport sector: The Article 29 Working Party observed in its Opinion 3/2017 on Cooperative Intelligent Transport Systems (hereinafter "C-ITS") that personal data processed through such systems may also include special categories of data as defined in Article 10 of the GDPR. More specifically, it finds that sensitive data may be collected through and broadcasted to other vehicles, such as criminal data in the form of speeding data or signal violations. It notably concludes that "as a consequence [such C-ITS] applications should be modified to prevent collection and broadcast of any information that might fall under Article 10".|
Various actors, roles and responsibilities
In case personal data is being processed (as it is the case in data analytics), it is important to examine the concrete situation so as to determine precisely the exact role played by the different actors involved in such processing. The various concepts enshrined under EU data protection law and in particular the difference between “data controller” and “data processor”, as well as their interaction, is of paramount importance in order to determine the responsibilities. In the same vein, such concepts are also essential in order to determine the territorial application of data protection law and the competence of the supervisory authorities.
The qualification of actors and the distinction between “controller” and “processor” can quickly become complex in a big data context. This is especially true taking into account additional data protection roles such as joint-controllership, controllers in common, and sub-processors. This is mainly due to the fact that many actors may be involved in the data value chain, the mapping of which can be rather burdensome.
Hence, additional guidance and template agreements, compliant with the strict requirements of the GDPR, are more than welcome to clarify the relationships in the big data value cycle.
Data protection principles
The GDPR outlines six data protection principles one must comply with when processing personal data, most of which are being challenged by some key features of big data.
- The principle of "lawfulness" implies each processing of personal data should be based on a legal ground (see next section).
- The principle of “fairness and transparency” means that the controller must provide information to individuals about its processing of their data, unless the individual already has this information. The transparency principle in a big data context – where the complexity of the analytics renders the processing opaque – can become particularly challenging and implies that “individuals must be given clear information on what data is processed, including data observed or inferred about them; better informed on how and for what purposes their information is used, including the logic used in algorithms to determine assumptions and predictions about them.”
|Illustration in the transport sector: In its guidelines on automated individual decision-making and profiling adopted on 3 October 2017, the Article 29 Working Party takes the example of car insurances to illustrate the possible issues of fair, lawful and transparent processing of personal data in the transport sector. It indicates that some insurers offer insurance rates and services based on an individual’s driving behaviour. The data collected would then be used for profiling to identify bad driving behaviour (such as fast acceleration, sudden braking, and speeding). The Article 29 Working Party concludes that in such cases, controllers must ensure that they have a lawful basis for this type of processing. They must also provide the data subject with information about the collected data, the existence of automated decision-making, the logic involved, and the significance and envisaged consequences of such processing.|
- The principle of "purpose limitation" requires personal data to be collected and processed for specified, explicit and legitimate purposes. Foremost, this requires any processing of personal data to have a clearly defined purpose in order to be permitted. This may be particularly difficult in a big data context because “at the time personal data is collected, it may still be unclear for what purpose it will later be used. However, the blunt statement that the data is collected for (any possible) big data analytics is not a sufficiently specified purpose.”
- The principle of "data minimisation” provides that personal data must be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed. It is clear that the concepts of “data minimisation” and big data are at first sight antonymic. Indeed, “the perceived opportunities in big data provide incentives to collect as much data as possible and to retain this data as long as possible for yet unidentified future purposes.”
- Furthermore, personal data must be "accurate" and, where necessary, kept up-to-date. Similarly to others, the accuracy principle is being challenged by some key features of big data. Indeed, “big data applications typically tend to collect data from diverse sources, and without careful verification of the relevance or accuracy of the data thus collected.”
- The principle of "storage limitation" requires personal data to be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed. The GDPR does not specify the exact data retention periods given that these are necessarily context-specific. Big data analytics is a good illustration of the possibilities of processing personal data for a longer period and the difficulties that may arise in relation to the storage limitation principle. For instance, the principle may undermine the ability of being predictive, which is one of the opportunities rendered possible by big data analytics. Indeed, if big data analytics is allowing predictability, it is precisely because algorithms can compare current data with stored past data to determine what is going to happen in the future.
It follows from the above that the core data protection principles are, for the most part, in contradiction with some of the key features of big data analytics, and thus difficult to reconcile. Nevertheless, rethinking some processing activities but also IT developments may help complying with such principles, notably by having well-managed, up-to-date and relevant data. Ultimately, this may also improve data quality and thus contribute to the analytics.
Legal grounds to process personal data
In case the GDPR applies, any processing of personal data must be based on one of the grounds listed in Article 6(1) of the GDPR. In other words, in order for a processing activity to be lawful, from the outset and throughout the activity, it must always be based on one of the six grounds exhaustively listed in the GDPR. Only four of them, however, seem to be able to be applied in a big data context.
- Consent: While "consent" is the first ground that can permit the processing of personal data, it can quickly become a difficult concept to comply with in light of its definition and the many conditions that must be met. More precisely, consent under the GDPR must be freely given, specific, informed and unambiguous. Furthermore, the controller should be able to demonstrate that the data subject has given consent to the processing operation and should allow the data subject to withdraw his or her consent at any time. The various conditions of consent are stringent and may be particularly difficult to meet. Therefore, relying on consent may prove to be unpractical or even impossible in a big data context, especially in its more complex applications.
- Performance of or entering into a contract: The processing ground provided under Article 6(1)(b) GDPR can be relied upon by the data controller when it needs to process personal data in order to perform a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; e.g., in case of purchase and delivery of a product or service. It follows that this ground for processing will be generally difficult to apply in a big data context, because it is unlikely that the processing of personal data for specific big data analytics purposes is “necessary” for the performance of a contract with the individual. Indeed, although big data analytics implies a complex chain of actors and multiple contracts, there is little interaction directly with the data subjects themselves.
- Legal obligation: Under Article 6(1)(c), the GDPR provides a legal ground in situations where “processing is necessary for compliance with a legal obligation to which the controller is subject”. Generally, it is unlikely that personal data processing in a big data analytics context can be based on a “legal obligation”. This being said, according to the Article 29 Working Party, such legal ground should not automatically be set aside in a technology context.
|Illustration in the transport sector: In its Opinion on C-ITS, the Article 29 Working Party concludes that the long-term legal basis for this type of processing is the enactment of an EU-wide legal instrument. Indeed, the Article 29 Working Party considers it likely, given the projected prevalence of (semi-)autonomous cars, that the inclusion of C-ITS in vehicles will become mandatory at some point in time, comparable to the legal obligation on car manufacturers to include e-call functionalities in all new vehicles.|
- Legitimate interests: The protection of privacy and personal data is not absolute and often requires a balance of interests. Given the difficulties to rely on the abovementioned processing grounds in a big data context, the legitimate interests of an organisation may pose a good alternative. The GDPR includes Article 6(1)(f), which permits the processing of personal data where it is necessary "for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data.” However, in an Opinion on the recent developments on the Internet of Things (hereinafter "IoT"), the Article 29 Working Party warns that a processing will not always be justified merely by the economic interests of the IoT stakeholder in the processing, taking into account the potential severity of interference into the privacy of the data subject. A similar reasoning could be transposed to a big data context. Therefore, when trying to rely on legitimate interests, a careful balancing test between the interests of the big data stakeholder and the data subject will remain of the utmost importance.
Finding the most adequate legal ground to permit the processing of personal data in the context of big data analytics may prove difficult. Indeed, the conditions associated to the grounds exhaustively listed in the GDPR are stringent and may limit or prohibit certain processing activities. Nonetheless, thorough assessments, such as in the context of a legitimate interests assessment, are likely to enable finding the most appropriate processing ground, while at the same time having the evidence to demonstrate the reasoning that lies behind, in accordance with the accountability principle.
Core obligations under the GDPR
Some of the core obligations of the GDPR applicable to controllers (and processors) may be particularly relevant in the context of big data. This is surely the case for the requirements to conduct data protection impact assessments (hereinafter "DPIAs") and to implement privacy by design and privacy by default measures.
DPIAs are required to be conducted in certain cases only, i.e. when processing is “likely to result in a high risk”, taking into account the nature, scope, context and purposes of the processing. While Article 35(1) GDPR clearly indicates that processing “using new technologies” is likely to result in a high risk, Article 35(3) and Recital 91 of the GDPR provide a non-exhaustive list of occasions when DPIAs are required. For other processing activities, the organisation should determine whether the processing activity poses a high risk to individuals. In such context, Recital 75 of the GDPR provides some relevant elements that may help determining whether a (high) risk exists. In addition to the abovementioned illustrations and elements provided by the GDPR to determine whether a DPIA may be required, Article 35(4) of the GDPR requires national supervisory authorities to establish a list of processing operations that are necessarily subject to the requirement to conduct a DPIA ("black list") whereas Article 35(5) allows national supervisory authorities to establish a list of processing activities for which no DPIA shall be required ("white list").
An analysis of the various lists and guidance published by the different authorities easily leads to the conclusion that new technologies, and in particular big data analytics, will almost systematically require carrying out a DPIA. Indeed, some of the key characteristics of big data appear to be targeted, such as “large scale processing”, “systematic monitoring”, “automated decision-making with legal or similar significant effect”, and “matching or combining datasets”. Similarly, the use of data to analyse or predict situations, preferences or behaviours, or the systematic exchange of data between multiple actors, or the use of devices to collect data (and in particular relying on IoT) should lead to the requirement to carry out a DPIA.
Furthermore, the requirement to adopt “privacy by design” measures entails that the controller must implement appropriate technical and organisational measures (e.g. pseudonymisation techniques) designed to implement the data protection principles (e.g. data minimisation). As for compliance with the “privacy by default” requirement, the controller must implement appropriate technical and organisational measures to ensure that, by default, only personal data necessary for each specific purpose of the processing are processed. This applies to the amount of data collected as well as to the extent of processing, period of storage and accessibility of the data. The measures adopted by the controller must guarantee that, by default, personal data are not made accessible to an indefinite number of individuals without the data subject’s intervention.
These requirements to implement dedicated "by design" and "by default" measures are particularly relevant in IT environments, and thus also to big data. In practice, it requires organisations to ensure that they consider privacy and data protection issues at the design phase and throughout the lifecycle of any system, service, product or process. The requirements can therefore be far-reaching and apply to all IT systems, services, products and processes involving personal data processing, but also require looking into organisational policies, processes, business practices and/or strategies that have privacy implications, and rethinking physical design of certain products and services as well as data sharing initiatives. Moreover, organisations must take technical measures to meet individuals' expectations in order to notably delimit what data will be processed for what purpose, only process the data strictly necessary for the purpose for which they are collected, appropriately inform individuals and provide them with sufficient controls to exercise their rights, and implement measures to prevent personal data from being made public by default.
|Illustration in the transport sector: The past decade has seen the rise of new transportation modes such as ridesharing. Ridesharing services allow car owners to fill the empty seats in their cars with other travellers. Ridesharing services however come with certain privacy and data protection implications for the users of such services. Indeed, users wanting to rely on a ridesharing service need to share their location data with the ridesharing operators in order to determine a point where drivers and riders can meet. Aïvodji et al. have developed a privacy-preserving approach to compute meeting points in ridesharing. Taking into account the privacy-by-design principle, they have been able to integrate existing privacy-enhancing technologies and multimodal routing algorithms to compute in a privacy-preserving manner meeting points that are interesting to both drivers and riders using ridesharing services.|
Rights of individuals
The GDPR aims to protect natural persons in relation to the processing of their personal data and therefore grants several rights to such persons. In addition to these rights, the GDPR further provides for strict procedures to respond to any data subject request in exercise of their rights, notably regulating issues with respect to the timing and format of the response, or the fees that may be requested. It also regulates the right for individuals to lodge a complaint with a supervisory authority, the rights to an effective judicial remedy against a supervisory authority, a controller or a processor, and the possibility for data subjects to mandate a not-for-profit body, organisation or association to lodge a complaint on their behalf.
The numerous rights granted by the GDPR to individuals can be particularly challenging in relation to complex processing activities. Indeed, generally speaking, such rights can be overreaching and thus difficult to integrate in the context of big data analytics. It is nonetheless important to carefully consider the various rights and anticipate their concrete application. This being said, technology can also provide a means to individuals to exercise their rights in a more innovative way, such as through privacy enhancing technologies.
|Illustration in the transport sector: in its guidelines on the right to data portability of 5 April 2017, the Article 29 Working Party notably advocates for a broad interpretation, whereby “raw data processed by a smart meter or other connected objects, activity logs, history of website usage or search activities” fall within the scope of the portability right. Therefore, in a big data analytics context, the exercise of the right to portability of data collected through intelligent cars (e.g., by various sensors, smart meters, connected objects, etc.) or related to C-ITS might turn out to be almost impossible namely from an engineering perspective, particularly in view of the Article 29 Working Party's far-reaching interpretation of this right.|
International data transfers
The GDPR maintains the general principle that the transfer of personal data to any country outside the European Economic Area (hereinafter the "EEA") is prohibited unless that third country ensures an adequate level of privacy protection. Accordingly, transfers of personal data to “third countries” (i.e. to countries outside the EEA not ensuring an adequate level of protection) are restricted. In such cases, the data flow must be based on a particular instrument to allow the data transfer to take place, such as Standard/Model Contractual Clauses (SCCs), Binding Corporate Rules (BCRs), codes of conduct and certifications, or derogations.
The provision of big data analytics services may entail that the personal data collected and processed will be transferred outside the EEA. This can be particularly true when relying on cloud computing services. It follows that the GDPR requirements related to the transfer of personal data must be taken into account in order to determine the most adequate solution to permit such international flow.
Any data flows should therefore be carefully assessed and mapped, notably as part of the mapping of the different actors, in order to determine the data location and put in place the adequate (contractual) instruments.
The present article undeniably only looks into and provides illustrations of the most topical issues, without claiming exhaustiveness. It however demonstrates that finding a balance between the various interests at stake is of paramount importance. It is therefore essential to keep in mind Recital 4 of the GDPR which stipulates that the right to the protection of personal data is not an absolute right, that it must be considered in relation to its function in society and be balanced against other fundamental rights, and that this must be done in accordance with the principle of proportionality.
Accordingly, any guidance or administrative/judicial decision should carefully take into account all interests at stake. Failing to do so would necessarily impede the development of disruptive technologies and prohibit the emergence of a true data economy.