The use of “Big Data” is becoming more widespread, with a real potential to positively transform the way the world uses and shares data to make predictions, including in the field of health, scientific research and dealing with highly dangerous epidemics. However, the use of such vast amounts of potentially personal information needs to be approached carefully against the back drop of a UK privacy law which highly values individuals’ rights to privacy and has data minimisation as one of its core principles.
“Big Data” is a moving term coined to mean a voluminous amount of unstructured, semi-structured and unstructured data, a data set that is so large and complex that it cannot be processed using database management techniques, but requires more sophisticated processing tools. In technical terms, this refers to petabytes or exabytes of data, potentially holding billions to trillions of records on millions of people. Often it is characterised by the “three V’s”: “volume, variety and velocity” of data.
The idea of Big Data analytics is to mine this data, by using tools developed to manipulate and analyse the information in order to make quicker, novel and/or more intelligent decisions/findings. Organisations (both private and public sector) are naturally keen to use these techniques as they can provide valuable insight into the market, customers/clients’ preferences and future trends. However, the potential benefit of Big Data to individuals (both individually and collectively) across the globe are also being recognised and it is being used more frequently, not necessarily for competitive gain, but for the “greater good”, such as in the fight against deadly diseases.
The sources of “Big Data” are wide ranging, from internet search data to social media postings to data collected by cameras, mobile phones, radio-frequency identification and by monitoring devices worn by patients in clinical trials. Essentially then a lot of the information collated may contain personal information, ie any information which (alone or combined with other information) allows a living individual to be identified. This could include names, dates of birth, medical data or individuals’ opinions/comments on specific topics. In addition, new personal information could be created by Big Data analysis, e.g. by combining the test results of a clinical trial with information posted on social media about the patient’s lifestyle to work out if they are likely to develop any medical conditions.
Using personal information
In the UK, the use of personal information is governed by the Data Protection Act 1998 (“DPA”). Any organisations collating personal information (whether from scratch, or using their existing records) for their own purposes in relation to Big Data therefore need to be aware of their obligations under the DPA. The Information Commissioner’s Office (the regulator in this area), strongly recommends that before organisations take any steps to collate/create Big Data they should carry out a privacy impact assessment (“PIA”). The aim of a PIA is to assess whether the organisation’s proposals would be appropriate and what the risk is in terms of compliance with the law.
Where collating personal information for the purposes of Big Data, organisations need to consider whether this would be expected by the relevant individuals (“data subjects”).
- What were they told would be done with their information?
- Would they reasonably expect their personal information to be analysed/shared with another company for analysis in this way?
- Would the use for Big Data be incompatible with the reasons it was collected in the first place (subject to the research exemption, see below)?
Transparency is key. In addition, the organisation needs to be confident that the information being used is adequate, relevant and not excessive for the proposed purposes, eg, does the name/address etc of the data subject really need to be included to analyse the success of a new heart monitor being trialled? Can the data being minimised be limited? With more and more stories in the news about data security breaches, organisations need to think carefully about the adequacy of the security steps they are taking. This is particularly relevant where the data being analysed is stored in a cloud, potentially hosted by a third party and potentially located outside the UK (which raises concerns due to the laws in the host country), or where it is being shared with another organisation (even if in Group), to carry out the analysis.
In some cases, in particular where sensitive personal data (eg health data, religious or other beliefs, race/ethnicity etc) is being used, the data subjects’ consent may need to be obtained, in others the organisation may be able to demonstrate that the use for Big Data is necessary in the legitimate interests of that company (provided this causes no unwarranted prejudice for the individual). For the latter condition, if the purpose of the Big Data analysis is likely to be in the public interest, eg, the trial of a new drug to fight cancer, or the analysis is designed to stop the spread of a deadly disease, this is likely to provide a stronger ground in favour of the processing.
There is an exemption under the DPA, which is potentially useful for Big Data analysis purposes. This exemption would allow the use of personal information for Big Data analysis, where the information was not collected with the aim of carrying out this analysis (so the Big Data analysis is potentially incompatible with the initial purpose and in breach of the DPA), but the organisation now intends to use it for research purposes. The DPA does not define ‘research’, but it is likely to include research for scientific or historical purposes, as well as commercial purposes (eg, market research). If the research is used to make a decision affecting an individual or is likely to cause substantial damage or distress to an individual, then the exemption would not apply. An example might be where the purpose of the research obtained through a drug trial (which contains sensitive personal data (ie, not anonymised data)) is for the organisation to share it with insurance companies (without the volunteers’ knowledge), and for these companies to use the data to alter the volunteers’ insurance premiums and to send tailored marketing for particular medical insurance products to those individuals. Please note that if this exemption applies, it does not provide carte blanche with regard to the use of the data and many of the requirements under the DPA still stand. Proceed with caution therefore when intending to rely on this exemption.
To get round the numerous hoops that must be jumped through to ensure compliance with the DPA, many organisations choose to anonymise the data they are preparing for analysis. True anonymisation of data can be difficult to achieve, however. Essentially, the original identifying information should be totally irretrievable and the “key” to unlock it thrown away! With the vast quantity of data now available on individuals it can be extremely difficult to say with certainty that there is no other data in the world that, when combined with the result data could not identify the individual. The answer therefore is to carry out a thorough PIA, and not to proceed with the analysis unless re-identifiation is extremely unlikely. Transparency with regard to this process will help to satisfy the data subjects concerns about the onward use of their data and helps to install trust in that organisation. The DPA then no longer applies, as anonymised information is no longer personal data.
For organisations carrying out clinical trials, this is obviously the preferred method. The results of any tests would contain sensitive personal information and it is unlikely that patients would volunteer if there was a chance that they could be identified from the published findings of the trial. A recent example of the use of anonymised data on a wide scale is in relation to the Ebola virus. Anonymised text and voice data from mobile phones was collated by a telecommunications company located in one of the African countries hit by the virus to allow a non-profit organisation to produce maps of the populated areas in the region, allowing authorities to then work out where to locate treatment centres and how they may be able to contain the disease by restricting travel. In addition, mobile phone mast activity information from mobile operators is being collected by the US Centers for Disease Control and Prevention to work out where most calls to helplines are being made from and thus where resources should be allocated.
In addition, the use of Big Data for medical purposes is being addressed in relation to the proposed implementation of the NHS care.data scheme, which is being designed to allow analysis of health information collected by GPs on an unprecedented scale. The intention is for any such studies to be carried out an anonymous basis, but there have been widespread concerns raised regarding the true anonymity of the data extracted from these records.
When the DPA was drafted (roughly 20 years ago) the possibility of data being used on such a large scale was very unlikely to have been considered. Arguably the fundamental principles are designed to deal with personal data, no matter in what format this is processed. However, should the Proposed EU General Data Protection Regulation be adopted, this would seek to clarify any grey areas that have developed, in particular in relation to the anonymisation of data. In addition, it would introduce a specific requirement for organisations to carry out privacy impact assessments where particularly “risky” use or other processing of personal information is envisaged and aims to provide greater rights for individuals.
The willingness of organisations to assist in Big Data analysis where there is chance of helping in the fight against deadly diseases/illnesses is, of course, highly commendable, but, organisations must not forget their obligations under the DPA. The consequences of getting it wrong and being found in breach of the DPA are significant, with potential fines of up to £500,000 fines at present (which may increase to 5% of global turnover/€100million (whichever is the greatest) if the proposed EU General Data Protection Regulation comes into force). As such, appropriate safeguards, whether this is carrying out an in-depth PIA, consulting with individuals, or anonymising the data should be carefully considered.