Matt Hervey, IP partner and Head of Artificial Intelligence Law, was interviewed by Inumidun Bayeroju of LexisNexis on the implications for IP of the new draft of the EU AI Act. They discussed the new obligations for generative AI, including the need to disclose the use of copyright works used in training.
What is the background to the amendments to the AI Act for generative AI?
The current AI Act compromise text seeks to shine a light on generative AI and copyright infringement, requiring generative AI providers to publicly disclose details of copyright works used in training.
Generative AI can now create useful text, images, code and other works and is attracting massive investment. OpenAI’s ChatGPT, a text generator, signed up over 100 million users in record time; Microsoft has invested $10bn and is incorporating the technology into its Office suite and search engine. Enterprises across all sectors are exploring the potential for cost savings and revenue growth.
These systems are trained on vast amounts of copyright works, sometimes scraped from the internet, and rightsholders are alleging infringement in a number of high-profile cases. Commentators have speculated that the current defendants were chosen because, unlike other high-profile developers, they disclosed details of the works used for training. The new disclosure requirement in the draft AI Act would put a target on the back of anyone training a generative AI on copyright works for use in the EU.
On 14 June 2023, the European Parliament adopted its negotiating position on the EU AI Act, including amendments to the proposal adopted by the Council of the EU on 6 December 2022. Alongside other amendments, the compromise text includes additional obligations on providers of foundation models and Generative AI systems.
Negotiations with the European Council on the final form of the law will now begin.
What is meant by ‘foundation models’ and ‘generative AI’?
The draft defines ‘foundation model’ as ‘an AI model that is trained on broad data at scale, is designed for generality of output, and can be adapted to a wide range of distinctive tasks’ (Article 3(1c)).
Additional obligations are set out for providers of foundation models used in or specialised into generative AI. ‘Generative AI’ is defined as ‘AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video’ (Article 28b(4)).
What express IP obligations are included for generative AI?
Under Article 28b(4)(c), the ‘provider’ of a foundation model used in or specialised into a generative AI system must ‘document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law’. This must be done before the model is placed on the market or put into service in the EU (Article 28b(1)).
The amendments raise a number of uncertainties. Generative AI is defined by reference to ‘AI systems’ and, under the amended compromise text, an AI system is defined as one whose outputs ‘influence physical or virtual environments’ (Article 3(1)). The scope of this requirement is unclear and it is unclear whether it would exclude some generative AI systems.
The EU AI Act applies to providers placing on the market or ‘putting into service’ AI systems. ‘Putting into service’ includes supplying a system ‘for own use’ (Article 3(1)(11)). However, a ‘provider’ means a natural or legal person that places an AI system on the market or puts it into service ‘under its own name or trademark’ (Article 3(1)(2). The purpose of this requirement is not clear: should the obligations depend on the carrying out of certain acts (such as providing an AI system) or only where those acts are performed are ‘under [the person’s] own name or trademark’? For example, would a company’s internal use of a proprietary generative AI that it develops in-house be use of the AI ‘under its own name or trademark’?
The requirement under the compromise text is to document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law. It is not clear what would amount to ‘sufficiently detailed’ documentation of copyright works. In the case of such large language models, literally billions of works may be used to train the model. The practicalities of compliance may prove to be too much for some developers: OpenAI’s CEO temporarily suggested that his company could cease operating in the EU if it is unable to comply with the provisions of the new legislation. Developers will want further guidance, such as how granular the documentation needs to be, whether they must document copyright works used to train a third-party pre-trained model they have refined, and the extent to which they can protect any trade secrets in their selection of training materials.
What obligations are included for IP more generally?
The draft EU AI Act does not otherwise expressly deal with third-party intellectual property. Certain provisions relating to governance, transparency and technical documentation could be argued to apply to intellectual property, but these do not appear to be intended to benefit third-party rightholders. Rightholders do not appear entitled to bring an action under the proposed Act in respect of any of these more general obligations. Instead, they are obligations on the ‘provider’, including to assist the relevant competent authority.
There are possible arguments to be made around implied obligations to mitigate the risk of IP infringement. National authorities and the Court of Justice may need to decide whether such arguments have merit.
Combining text from the recitals allows for an argument that AI deployers should mitigate the risks of third-party IP infringement under general requirements for ’governance’. Recital 58a links ‘governance’ to ‘mitigating risks to fundamental rights’ and such rights include ‘intellectual property rights’ (Recital 28a). However, no such obligation is set out expressly in the draft articles.
Under Article 28(b)2(b), providers of foundation models are required to incorporate only datasets subject to ’appropriate data governance measures…in particular measures to examine the suitability of the data sources and possible biases and appropriate mitigation’. Data governance is discussed in Recitals 44 and 60g and Article 10(2) without any express reference to intellectual property. Article 10 considers data and data governance, in particular in relation to the training of the AI model, and the collection, validation and testing of data sets. Article 10(2)(b) requires consideration of ‘data collection processes’ as part of such data governance. While this could be interpreted as covering, eg scraping of copyright works, the other elements set out in Article 10(2) with regard to data governance are concerned with the accuracy and completeness of training data, the removal of bias, and the protection of personal data, as opposed to the protection of third-party intellectual property rights.
Article 16(ac) requires providers of high-risk AI systems to ‘provide specifications for the input data, or any other relevant information in terms of the datasets used, including their limitation and assumptions, taking into account the intended purpose and the foreseeable and reasonably foreseeable misuses of the AI system’. This echoes the language of data governance rather than IP considerations, but the point may be debated.
Where the draft text imposes transparency requirements on the provider, it likely that these will be interpreted in line with the transparency obligations for high-risk AI under Article 4a(1)(d). These obligations relate to ‘traceability and explainability, while making humans aware that they communicate or interact with an AI system as well as duly informing users of the capabilities and limitations of that AI system and affected persons about their rights’. Although wording about the ‘rights’ of ‘affected persons’ could be interpreted as covering the infringement of IP that does not appear to be the intention. Indeed, for generative AI there are separate obligations of transparency and to document the use of training data protected under copyright law: at Article 28b(4)(c), the ‘transparency’ obligation for foundation models for generative AI under Article requires that natural persons are aware they are interacting with AI.
Article 28b(4)(b) requires providers of foundation models used in or specialised into a generative AI to ‘train, and where applicable, design and develop the foundation model in such a way as to ensure adequate safeguards against the generation of content in breach of Union law in line with the generally acknowledged state of the art, and without prejudice to fundamental rights, including the freedom of expression’. This appears to focus on the outputs of generative AI and seeks to avoid unlawful content (perhaps, such as discriminative text), using state of the art techniques (presumably such as moderation technology and human oversight) to prevent or filter out such content while balancing this with competing concerns such as freedom of speech. However, again, it might be argued that this obligation includes avoiding infringements of third-party intellectual property rights protected under Union law (although, contrary to this, Recital 60h distinguishes questions relating to ‘content in breach of Union law’ and ‘copyright rule’). With more strained interpretation, it could be argued the obligation under Article 28b(4)(b) relates to the training and use of the AI in general, not merely the outputted content, and that it also requires protection of third-party intellectual property rights as ‘fundamental rights’.
In contrast to the above speculative protections for third-party rights, the Act gives repeated, express attention to protecting the intellectual property, including trade secrets, relating to AI systems. For example, disclosure of IP is a last resort and must be protected by downstream entities (Recital 60 and Article 28(2b)) and regulators (Recital 79 and 83, Articles 64, 70, Annex VII, point 4.5). Entities providing and deploying AI will need to take care to distinguish between provisions requiring public disclosure (such as Article 28b(4)(c)), provisions requiring disclosure to specific entities subject to confidentiality (such as Article 28(2) and (2b)), and provisions requiring record keeping in case disclosure is later required (such as technical documentation under Article 11(1), 16(c) and Article 50)
Could more IP provisions be added to the AI Act?
New recitals highlight concerns for third-party intellectual property. According to Recital 28a, an AI system could be classed high-risk because of its impact on intellectual property rights (and the Commission is required to assess potential additions to the list of high-risk systems under Article 84(1)). Recital 60h notes that generative AI systems ‘raise significant questions related to the generation of content in breach of Union law, copyright rules, and potential misuse’ and calls for periodic monitoring by the Commission and the AI Office.
Is there an overlap with the EU TDM exception?
The draft AI Act does not expressly deal with text and data mining exceptions. The Regulation is expressly without prejudice to the EU DSM Copyright Directive (see Recital 60h) and Article 28b(4)(c) is expressly ‘without prejudice to national or Union legislation on copyright’. Since Article 28b(4)(c) relates to training data ‘protected under copyright law’ it would appear to require details of works in which copyright subsists, irrespective of whether an exception applies to the use made by the AI provider. The EU DSM Copyright Directive will continue to determine the extent of the EU’s harmonised text and data mining exception, including the effect of opt-outs, unaffected by the AI Act. New recitals highlight concerns for third-party intellectual property. According to Recital 28a, an AI system could be classed high-risk because of its impact on intellectual property rights (and the Commission is required to assess potential additions to the list of high-risk systems under Article 84(1)). Recital 60h notes that generative AI systems ‘raise significant questions related to the generation of content in breach of Union law, copyright rules, and potential misuse’ and calls for periodic monitoring by the Commission and the AI Office.
Will the UK’s proposed Code of Practice cover similar ground?
Due to Brexit, the UK never implemented the EU text and data mining exception and, currently, the equivalent exception for text and data analysis is limited to research for non-commercial purposes (section 29A of the Copyright, Designs and Patents Act 1988). The UK Government, having initially declared in 2022 its intention to introduce an exception for text and data mining for any purpose and without a right of opt out, is continuing to consult with stakeholders. It has not, as yet, proposed any obligation on providers of AI to give details of copyright works used in training. The government has indicated that the IPO ‘will produce a code of practice by the summer which will provide guidance to support AI firms to access copyrighted work as an input to their models’. No doubt the IPO will consider the developments in the draft EU AI Act. (There are rumours that the code of practice will not be ready before ‘late’ summer.)
How will the EU and UK approaches affect the development and implementation of foundation models and generative AI?
The draft AI Act and text and data mining exceptions are just two examples of discrepancies between the UK and EU for generative AI. The UK is also one of very few jurisdictions to provide for copyright protection of computer-generated works without a human author. The UK Government also hopes to take a lighter touch to regulation, minimising ‘cross-cutting’ regulation of AI and wishing to achieve changes in privacy law to promote innovation.
There can be no doubt that the UK narrow exception to text and data analysis is less ‘pro-innovation’ than key jurisdictions, including the US, EU and Japan. (Whether or not the UK should be ‘pro-innovation’ if that negatively impacts rightholders is a separate question, which requires stakeholder engagement and careful consideration.) It is too early to assess whether the UK’s approach to text and data analysis and other divergences are affecting investment in foundation models and generative AI as between the EU and UK. Legal and regulatory frameworks, legal certainty, access to talent and funding will all play their part. At the time of Brexit, the UK was widely considered to enjoy the third highest investment in AI globally (after the US and China) and it still hosts centres of excellence, such as the Turing Institute, and outstanding AI companies, such as DeepMind and Stability AI.
Fundamentally, it is also unclear how much scope there is for the UK to grow (or maintain) national AI development and implementation through regulatory divergence. In a globalised economy, the EU’s approach to regulating AI may become a de facto international standard (as, arguably, did the EU’s approach to privacy). Moreover, there are ongoing attempts to achieve wide international approaches (such as via the Global Partnership on Artificial Intelligence) and emerging initiatives, such as the promise for a US-EU voluntary AI code of conduct to be drafted ‘within weeks’.
What practical steps should IP lawyers advise their clients to take?
Generative AI is potentially applicable to any business in any sector. Many believe it has significant scope to cut costs, improve internal and external communications and open up new revenue sources. It has added to the many IP considerations required for AI, including how best to protect and enforce data, AI technology and useful outputs of AI. Unlike ‘discriminative’ AI, which are typically trained on a specific proprietary or licensed dataset, foundation models and generative AI often need to be trained on vast datasets of copyright works. Therefore, IP lawyers should work closely with the technical specialists developing generative AI:
- to identify which works need to be used, what copyright (and other restrictions) apply and the possibility of obtaining licences;
- to determine whether aspects of the works protected by copyright will be extracted to train the generative AI and, in turn, whether the extracted data and the trained model itself constitute copies of the works used;
- to investigate whether it is practical to mitigate risks of copyright infringement by extracting data, training the model and hosting the model in specific jurisdictions (ideally seeking local advice);
- to explore guardrails, such as filters on inputs and outputs and human oversight, to minimise the risk of infringing outputs; and
- to document this work in case it needs to be relied on to show compliance with the AI Act or otherlegislation or regulation
To the extent an IP lawyer is dealing with a third-party generative AI, they should explore the availability and value of warranties and indemnities to cover the concerns above.
And, of course, IP lawyers should stay up to date with the ongoing developments in legislation, regulation, consultations and litigation in this area.