In our latest article, we summarized the recent events involving OpenAI, ChatGPT and the Italian Data Protection Authority (Garante).
As we noted, generative AI’s quest to achieve full regulatory compliance may only be at the beginning: Privacy investigations on ChatGPT are still ongoing; more privacy fulfilments may be required (especially for, but not limited to, foundation models) in the near future, also by the AI Act; and, broadly speaking, privacy compliance is no easy task, especially for AI systems trained on large amounts of data scraped from the Internet.
Nevertheless, the measures implemented by OpenAI and welcomed by the Garante can shed light on some of the main issues to be solved when applying privacy laws to AI. Indeed, the same measures can be used as guidelines for developing generative AI in a way suitable to “reconcile technological advancements with respect for the rights of individuals.”
Below is our five-point checklist that identifies the main measures to implement for a privacy-mindful generative AI system, based on the Garante’s recent orders towards OpenAI and on the measures implemented by OpenAI to remove the temporary limitation to the processing of Italian data subjects’ personal data.
More guidelines may soon be inferred from other Supervisory Authorities’ (or the EDPB’s) investigations on OpenAI: so, stay tuned and don’t forget to follow us!
1. A privacy policy shall be provided both to the users of the AI systems and to the data subjects whose personal data may have been used to train the AI system.
Transparency is key for a privacy-mindful AI system. This is clear from the recent orders from the Garante. There seems to be no excuse suitable to exempt the provider of an AI system, acting as data controller, from informing data subjects about the features of the processing of their personal data (including the purposes and means) and from making such information easily accessible.
Providing a privacy policy may be quite easy as far as the users of the AI systems are concerned; but what about the data subjects, whose personal data are collected and processed for the purpose of training the AI system?
In such case, the provider of the AI system/data controller should consider launching a “nonmarketing informational campaign”, or at least, this is what the Garante required OpenAI to do.
By May 15, 2023, OpenAI must launch a public awareness campaign to inform data subjects through the main Italian media outlets that ChatGPT may have used their personal data for AI training purposes, that they can find a detailed information notice on OpenAI’s website, and that they can request and obtain the deletion of their personal data using the tool on the same website.
OpenAI will have to agree with the Garante on the contents of the above nonmarketing campaign. It will therefore be important to check how the required information is drafted and disseminated to data subjects, to gather further guidelines on how to meet the GDPR transparency requirements when AI systems are involved.
2. The processing of personal data, including for the purpose of training the AI systems, shall be grounded on a suitable legal basis.
Especially after the EDPB’s binding decisions dated December 5, 2022 and the Irish Data Protection Commission's decisions dated December 31, 2022, the use of a contractual legal basis must be carefully evaluated—and, as far as personal data are processed for training AI systems, this legal basis must be avoided.
Indeed, with regards to OpenAI’s processing activities for training ChatGPT, OpenAI was required to change “the legal basis of the processing of users’ personal data for the purpose of algorithmic training, by removing any reference to contract and relying on consent or legitimate interest as legal bases by having regard to the assessment the Company is required to make from an accountability perspective.”
The Garante, while substantially excluding the possibility to use the contractual legal basis, provided an important confirmation: The processing of personal data for the purpose of training an AI system may be based on either data subject consent or on the legitimate interest of the data controller, and the choice of legal basis to be preferred must be made by the data controller, on a case-by-case basis, following an assessment for which the data controller is fully accountable.
In doing so, the Garante substantially excluded the “absolute need” to rely only and in all cases on the data subjects’ consent for the lawful processing of personal data to train the AI systems, thus leaving the door open to the use of the legitimate interest. Obviously, the possibility to concretely and lawfully rely on the legitimate interest needs to be confirmed through a legitimate interest assessment to be carried out (and duly documented) by the data controller before processing starts.
3. Measures and tools to allow the exercise of privacy rights shall be provided to all data subjects (regardless whether they are users of the AI system), who shall have and maintain control over all their data (including the data used to train the AI systems).
Irrespective of how the personal data are collected and processed, all data subjects—whether or not registered users of the AI system—must always be in the position to have and maintain control over their data and, to that end, they must be granted the possibility to easily exercise their privacy rights.
In particular and with no exclusions (and also depending on the applicable legal basis), data subjects must be granted the right to object to the processing of their data and/or to delete their data, including when processed for the purpose of training the AI system, as well as to obtain a rectification of the personal data that might be processed incorrectly to generate contents or outputs.
Should the rectification of the personal data not be possible based on the current state of the technology, the data subjects must be entitled to request erasure of the incorrect personal data.
4. Children shall be protected.
In light of the possible risks posed by AI systems (including the risk of providing inaccurate information and of exposing children to outputs and content that may be inappropriate for their degree of development and self-awareness), an age gate must be implemented together with measures suitable to verify the users’ age and—when necessary—block their access or collect their parents’ authorization to use the AI system.
The age gate already implemented by OpenAI will indeed have to be supplemented by an age verification mechanism, which has to be presented to the Garante by May 31, 2023 and put in full operation by September 30, 2023.
As it will be presented to the Garante before implementation, the mechanism that OpenAI will use will also serve as an example for other providers of AI systems or online services that may involve minors, as it will show how to verify users’ age effectively and safely.
5. The providers of AI systems, as data controllers, shall be accountable for the choices made on data protection and privacy compliance, especially where it is necessary to find a balance between technology and fundamental rights.
All the measures implemented by OpenAI were welcomed by the Garante, which recognized the efforts made to “reconcile technological advancements with respect for the rights of individuals.”
It has already been noted that applying the GDPR to AI is no easy task. By way of example, in the study “The impact of the General Data Protection Regulation (GDPR) on artificial intelligence” prepared for the European Parliament, it was pointed out that “AI is not explicitly mentioned in the GDPR, but many provisions in the GDPR are relevant to AI, and some are indeed challenged by the new ways of processing personal data that are enabled by AI. There is indeed a tension between the traditional data protection principles – purpose limitation, data minimisation, the special treatment of 'sensitive data', the limitation on automated decisions—and the full deployment of the power of AI and big data.”
Such tension will be overcome through an interpretation of the data protection principles that is consistent with the beneficial uses of AI. The data controller is in charge of such interpretation and is made accountable for it.
In OpenAI’s case, the accountability of the data controller was mentioned specifically with regards to the identification of the legal basis for the processing of personal data to train ChatGPT, but it is clearly not limited to that.
A data controller must be able to demonstrate compliance with all the principles relating to processing of personal data: lawfulness, fairness and transparency; purpose limitation; data minimization; accuracy; storage limitation; integrity and confidentiality. In the AI-field, these principles cannot be disregarded simply because they are not easily applicable. As recommended in the study mentioned above, they must be “interpreted and applied in such a way that it does not substantially hinder the application of AI to personal data, and that it does not place EU companies at a disadvantage by comparison with non-European competitors” and, in doing so, the data controllers will be “be provided with guidance on how AI can be applied to personal data consistently with the GDPR, and on the available technologies for doing so.”
With this in mind, should one conclude that the Garante was the first European Supervisory Authority that lent a hand to the providers of AI systems to support them in determining how to develop AI systems in a privacy-mindful way?
The input of the EDPB’s task force on ChatGPT will likely help provide an answer.
This article is updated as of May 2, 2023.