We have only just started to reap the benefits of “Big Data” – from foreseeing deadly infections to fraud detection.  It is a key source of value for many industry sectors: profiling, spotting market trends, product performance analysis and forecasting future outcomes.

The use of large data sets that are collated and analysed to discern patterns and make optimal decisions is an exciting journey many companies are only just starting to explore.  There is, however, a potential darker side to the perceived benefits of big data: the effect on personal privacy.  In this regard, is the GDPR a welcome guiding light to the benefits of Big Data, or will it strike a fatal blow to the utility of it, in an attempt to protect our privacy?

What is the GDPR?

The GDPR will come into force on 25 May 2018 and will replace the current data protection legal framework based on the EU Data Protection Directive (95/46/EC). It will apply to any organisation world-wide that collects and processes the personal data of individuals located in the EU (“data subjects”).  It enables individual Data Protection Authorities to impose significant fines for breaches, in some cases up to the higher of 4% of annual worldwide turnover and EUR20 million, which has certainly made organisations take note of the need to comply.  The key changes brought by the GDPR include:

  • Data processors now have direct data protection obligations
  • Consent must be freely given, specific, informed and unambiguous
  • Fair processing notices must now be clearer and more detailed
  • Data subject rights have been significantly increased, with the addition of:
    • A right for the data subject to access the information being processed about himself or herself
    • A right to correct incorrect personal data about oneself held by an organisation
    • A right to restrict certain processing of one’s personal data
  • The addition of a prohibition of automatic processing where such processing has legal effects on a data subject.

What is Big Data?

“Big Data” is a blanket term for collections of data sets that are enormous in size and complex, such that their processing using traditional data management means, such as relational database management systems, is problematic.  Big Data is regarded as meeting the following characteristics (often called the “Four V’s”):

1. Sheer Volume of data;

2. A large Variety of data (in terms of types and structure);

3. Veracity of data, in that the data is, on the whole, representatively accurate and trustworthy (as opposed to exactly so); and

4. The data needs to be analysed at a high Velocity in order to derive value from it.

So why does Big Data cause problems in the context of the GDPR?

Big Data sets will often include personal data, and in many cases, it is not possible to separate the personal data from the non-personal data.  The aim of Big Data is to uncover relationships within and amongst the information, through analytics and processing.  Given the accuracy and trustworthiness of any particular data set may not be exact, but rather directionally representative, the starting point of Big Data itself runs contrary to a fundamental principle of the GDPR – that the accuracy of the personal data of a particular data subject in the possession of an organisation must be maintained and protected.

Furthermore, Article 22 of the GDPR prohibits automatic processing, including profiling, where such processing has a legal effect on a data subject, or similarly significantly affects the data subject.  In this regard, profiling is defined as: “any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements”.

Some of the privacy risks particularly pronounced in the context of Big Data profiling therefore include:

1. Processing of personal data outside of the purpose for which it was collected;

2. Use of incorrect and/or outdated information;

3. Discrimination or bias against certain individuals or groups resulting from the application of certain profiling algorithms; and

4. Processing of personal data in excess of what is needed in order to process it.

Because automatic processing involves such high risks on privacy, it is prohibited in principle under the GDPR, except where:

  • It is performed based on (explicit) consent; or
  • It is required to enter into or perform a contract, provided the data subjects concerned can contest an automatic decision and obtain human intervention.

Furthermore, the GDPR provides that sensitive personal data may only be automatically processed based on explicit consent, irrespective of the effect of such processing, and that data subjects must be informed of the use of automatic processing and given information on the logic used, as well as the potential consequences.

Note that organisations have already accumulated large amounts of data – and the GDPR applies not just to data sets created going forward - but also to those already in existence today, insofar as such that pre-existing data sets would be the subject of processing after the GDPR comes into force.  It will undoubtedly prove problematic in practice to obtain the required explicit consent for specific uses of a data set that already exists (and is, in fact, already in use).

So how can Big Data be used in practice under the GDPR?

It is imperative that businesses review their current use of profiling and automated processing practices and processes, to:

  • Identify where they are using these processing approaches and whether there is personal data involved;
  • Understand the work required in respect of “cleaning” data sets, to permit their use in a GDPR-compliant way, including possible use of pseudonymisation of the personal data;
  • Understand the effects of using profiling and automated processing, in particular where they affect “legal, or significant other interests”;
  • Ensure that a valid legal basis is applicable to their use of profiling (i.e. explicit and valid consent of the data subject, or falling within the “contract exemption”);
  • Enable them to provide information on the algorithmic logic used and the consequences of that algorithmic determination, to the data subject; and
  • Ensure there is a process in place for data subjects to obtain human intervention in respect to a decision reached on the basis of automated processing.

Is the GDPR the death knell of Big Data?

There are clearly some specific challenges in reconciling data protection principles set out in the GDPR with the characteristics of Big Data analytics.  However, these are not insurmountable, nor incongruous with the aims of the GDPR.  Organisations should, however, think through the why and the how in respect of Big Data profiling, and ensuring transparency and privacy by design are at the heart of their “Big Data journey”.  With the EU’s 2015 Digital Single Market Strategy targeting Big Data as a “catalyst for economic growth, innovation and digitisation across all economic sectors […] and for society as a whole,” it is imperative that Big Data is seen as an opportunity to be actively nurtured and better understood, including through the prism of privacy compliance, so that its potential may be fully realised.