Artificial intelligence (AI) technologies are providing increasing benefits and potential areas of application in the healthcare sector. This needs to be accompanied by a consideration of legal implications, in particular in relation to data.

It is clear to see the advantages that AI technologies can bring to healthcare. For instance, 99% of IBM’s Watson cancer treatment recommendations were consistent with physician decisions, highlighting its accuracy and ability to create efficiencies. Yet, AI systems need to be ‘trained’ through inputting data. Machine and deep learning tools analyse structured data such as patients’ age and disease history and natural programme language extracts data from unstructured data sources such as from clinical records. These techniques ensure that complex data can be collated and analysed to produce meaningful outcomes.

The processing of data about identifiable patients is currently regulated in the UK by the Data Protection Act 2018 (DPA 2018) and the General Data Protection Regulation (GDPR), which also forms part of UK law. Organisations must comply with this data protection regime when creating and implementing AI systems. In this article we consider the processing and security issues facing data controllers and processors in a scenario where an entity uses an AI analysis tool to carry out research to produce an effective AI system for treatment recommendations.

Sources of data to train AI technologies

The first thing to be considered is where the datasets to be used to train the AI tool come from and if these data sets contain “personal data” within the meaning of relevant data protection laws. In the healthcare context, personal data is also likely to be health data and to qualify as “special category data” under the GDPR and DPA 2018.

Anonymisation / Pseudonymisation

If the data sets are “raw” data which clearly identify patient names, then this is straightforwardly personal data. If the data is provided in a way which doesn’t identify patient names (so for example, patient ID numbers have been allocated instead), there can be an assumption that the information concerned is therefore just “data” and not “personal data”. However, this is an issue that needs to be carefully considered as the data could in fact merely be “pseudonymised”. The ICO issued a code of practice on anonymization under previous data protection laws which still contains some relevant points, one of which is that in order to assess whether data is truly anonymised, the risk of re-identification of individuals from the data and other data that could potentially be available needs to be considered.

Assuming that the data is personal data, compliance with the regulatory regime for personal data will need to be addressed. The next step would therefore be to consider what lawful basis for using that personal data could apply to enable the use of the personal data for the purpose of AI training.

Consent or other lawful basis?

Under the GDPR, health and genetic data about identifiable patients may only be processed if permitted by law, which could include where there is consent of the data subject. The DPA 2018 offers some exemptions for public health or healthcare research purposes, but unless the purpose for the processing clearly falls into one of these limited exemptions, explicit patient consent is likely to be needed for processing of health data.

Obtaining individual patient explicit consent can be logistically cumbersome and costly but is an important step to get right. The consent request needs to be presented in a clearly distinguishable way in an intelligible and easily accessible form, using clear and plain language. The data subject must have genuine choice whether or not to give consent and such consent must be informed. This means that the data subject must understand not only how their data is to be processed, but also who will receive their data, for instance through the AI system. This may present a challenge in the context of AI, since the purpose of the processing may not always be fully fixed at the outset.

An AI tool could initially provide support for treatment decisions for a particular individual but it could also be valuable to include that individual’s data within further research or drug discovery. Therefore, having clarity at the outset on how that individual’s data will be used and thinking about how the data might be used in the future (and transparently communicating this) is essential. In addition, the requirement that a data subject must be able to withdraw consent at any time without detriment may make consent unsuitable as a lawful basis in certain situations - for example where data is used for medical research. In such a situation, one of the exemptions for research purposes set out in the DPA 2018 may apply.

Staying within purpose / managing consents

The GDPR principle of purpose limitation requires that personal data is collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes. This means that when starting an AI project the personal data collected for analysis of treatment decisions can only be used for that purpose. This can be problematic for the training of AI systems where it is not always initially clear how the data will be ultimately used or what data will be relevant. When considering how to document consent, systems should be put in place to ensure that if personal data is sought to be used for a purpose different from that originally intended, then, where needed, the consent of the data subject for the new purpose can be sought.

Controllers and processors

Collaborations regarding AI systems often involve the sharing or hosting of various data sets or databases. Under the GDPR, certain conditions must be met before personal data is shared with a third party or before an external data processor is appointed to process personal data on an organisation’s behalf. In particular, the transfer of personal data must meet the data protection principles and the transfer must have a lawful basis.

The GDPR differentiates between the entity determining the purpose of the processing (Controller) and the one processing the data on the controller’s behalf (Processor). If a Processor is processing personal data on a Controller’s behalf, there must be a written contract in place that meets the requirements of the GDPR. Both Controller and Processor are responsible for implementing appropriate measures to ensure compliance with data protection principles and documenting their compliance. In relation to this obligation, Processors involved in training the AI system are held to the same level of scrutiny as the big pharma company they are instructed by. Yet the onus is on the Controller to demonstrate compliance for itself and its Processors and to communicate transparently with data subjects regarding its processing activities. The GDPR imposes direct obligation on both Controllers and Processors, so both should design AI systems to meet the requirements of the data protection regime.

Under the GDPR, data subjects have enhanced rights including the right to access personal data held about them and the right to have their data erased in certain circumstances. Organisations will need to have systems in place to deal with such requests promptly. Data subjects’ enhanced rights mean they can claim directly from a Controller and Processor in the event of a breach.

In addition, the principle of data minimisation is important: for example, is it necessary to share data in personally identifiable form, or can steps such as pseudonymisation of the data be implemented.

Security compliance

The GDPR requires that ‘appropriate technical and organisational measures’ are in place to process personal data securely, but neither the GDPR nor the DPA 2018 set out exactly what is meant by technical measures in terms of security features or levels of encryption. Controllers and Processors must take appropriate organisational and technical measures to ensure a level of security protection proportionate to the efforts and risk involved. This means that organisations are expected to keep their data security under review as threats change over time and to have controls in place that are commensurate with the risk. Given the sensitive nature of healthcare data and the high risk posed to data subjects by its release, Controllers and Processors will be expected to put in place robust, up-to-date security controls. It is advisable for organisations to maintain and update internal policies regarding their decision-making as the Information Commissioner’s Office (ICO) is likely to want to see written evidence that these issues were considered in the event of any investigation.

In September 2018 the UK Government published a code of conduct on AI and data-driven technologies in healthcare setting out what it expects from collaboration between industry, the NHS and the healthcare system. The code, which is currently in consultation form, outlines 10 key principles for safe and effective digital innovations including transparency of the learning methodology and the algorithm to satisfy data subjects’ ‘right to explanation’ and building public confidence in AI outputs. In practice this will be difficult to implement as keeping these details confidential is essential to commercialising these assets. The code is supplementary to the Data Ethics Framework and reflects principles set out in the GDPR.


While AI has the potential to revolutionise healthcare and improve efficiency, it carries a risk of data breach and cyber-attack. Both Controllers and Processors need to have robust security systems and processes to maintain patient trust and to meet GDPR requirements. Whilst there are numerous AI uses within the healthcare system, organisations must focus at the start on what they are trying to achieve and ensure that, where processing is based on consent, appropriate explicit consent from patients is obtained for their specific project. Technical measures, such as encryption, audit logs and access controls should be put in place alongside organisational measures to ensure that all those working within the organisation adopt security conscious practices. Data breach and incidence response plans should be implemented to deal with data beaches or cyber-attacks when they arise. Record keeping to demonstrate compliance, transparent communication to data subjects and training of individuals who handle personal information are key.