What are the risks of launching an AI system where the training data was tainted with third party rights? Often this question needs an answer fast and on a global basis. But what do US tech lawyers need to know about the position in Europe?

To harness the power of AI, many organisations are racing to amass as much data as possible. The insatiable appetite for high volumes of diverse data has sent developers searching for pre-existing data that they can use to train AI systems. However, data comes in many forms and is often subject to a patchwork of third party rights which could restrict the way in which such data can be used.

Due to the lack of visibility over the legal rights in data, the fact that data used in training originated from a third party may only emerge late in the day and when the product is ready for launch. In these scenarios, the million dollar question is how much legal risk is there in launching an AI product which could be "tainted" by third party data rights?

It’s a tech lawyer’s job to answer this question, often with very little notice and on a global basis. While much will depend on the specific facts, from a European perspective, the following framework will help to assess the legal and practical risks.

Which rights apply?

Discussing data as a form of property which can be "owned" has become so pervasive that it is entirely reasonable to assume that data is just as much a type of property as the computer hardware on which it’s stored. However, from a European perspective, the legal rights which can be used to protect tangible property from unlawful taking or interference do not apply to data itself. Instead, data is considered to have the same legal classification as "information", which has traditionally fallen outside the scope of the rights which apply to tangible property.

While data itself may not be considered a form of property, there are multiple overlapping legal rights which may affect the acquisition, use and disclosure of data. Aside from data protection law (which places limits on the use of personal data/PII for training AI systems), the main rights in data from a European perspective are (1) contractual rights; (2) confidentiality/trade secrets; (3) copyright (either in the data itself or in an original database); and (4) EU database right.

Different rights, different risks

The nature of the rights which apply to the data matters because the right defines the framework in which a European court will assess any sanctions for using the data to train an AI system, if done without authorisation. In addition to the nature of the right involved, the risk profile for launching an AI trained using “tainted data” will also depend on whether any of the rights are engaged by the ongoing use of the AI system. For example, an “ongoing breach” scenario potentially arises where some of the training data in which third party rights subsist end up forming part of the trained system. Whether this happens will depend on the technical implementation of the AI system. A database of the weightings of a trained neural network is unlikely to contain any training data; however, other elements of an AI system could retain elements of the original data. While the risk profile for AI systems which do not involve an ongoing use of third party data is generally lower, this can depend on the nature of the rights protecting the third party data.

IP Rights – copyright and rights in databases

Where data is protected by copyright or database right, a European court will have a wide range of remedies available to penalise infringement of those rights. Breach of both copyright and database right engage the protections offered by the IP Enforcement Directive (2004/48/EC). While the specific implementation of the Directive varies from country to country, in general terms the Directive requires that European courts allow IP rights holders to seek corrective measures (e.g. recall of infringing software), preliminary and permanent injunctions, damages (based on harm suffered, unlawful profits or a reasonable royalty), legal costs and publication of judgments.

Under the IP Enforcement Directive, corrective measures are granted in relation to goods that have been “found to be infringing an intellectual property right” and injunctions are “aimed at prohibiting the continuation of the infringement”. A trained AI system which doesn’t contain any unauthorised third party data is therefore unlikely to be subject to these IP-related measures, as its future use will not result in an ongoing infringement of IP rights (in contrast, see 'Confidentiality/trade secrets' section below if the AI system was trained through misuse of confidential information). Instead the primary form of relief awarded to the rights holder would be financial compensation relating to any infringement which took place during the training process.

Contracts

Breaching contractual rights first and foremost gives rise to a claim in damages and is in principle less likely to result in an injunction preventing commercialisation of AI system than in the context of breach of IP rights. Injunctions for breach of contract are more likely where IP rights underpin the contract, as could be the case with datasets used for AI training. However, assuming there is no ongoing IP infringement, then the situation becomes similar to that discussed under 'IP Rights', above, even if there is a contractual overlay.

Confidentiality/trade secrets

Unlawfully using confidential data to train an AI system potentially engages the protections offered by the EU’s recent Trade Secrets Directive (2016/943/EU). These are similar in scope to the protections offered to IP rights (e.g. damages and interim/permanent injunctions prohibiting the unlawful use or disclosure of the confidential data).

Where the third party data used to train an AI system is confidential, the potential for the courts in Europe to “reach through” and prohibit the use of a trained AI system is higher. A number of European jurisdictions are willing to grant injunctions to prevent commercialisation of products which have been developed through misuse of a trade secret. The EU’s Trade Secrets directive now also requires all member states to provide remedies relating to “infringing goods” (“goods, the design, characteristics, functioning, production process or marketing of which significantly benefits from trade secrets unlawfully acquired, used or disclosed”). These include recall and destruction, injunctions prohibiting their sale and damages. Injunctions can be either temporary or permanent. Where they are temporary the duration should be “sufficient to eliminate any commercial or economic advantage that the infringer could have derived from the unlawful acquisition, use or disclosure of the trade secret.”

The courts in Europe haven’t yet considered the application of these infringing goods provisions to products such as AI systems. Where the AI is embodied in physical products or distributed on physical media the provisions are likely to apply. Less clear at present is the application to AI systems which are provided as a service, e.g. on a SaaS basis. Subject to any future guidance from the CJEU, the willingness of European courts to “reach through” and prevent commercialisation of an "AI system as a service" based on the unlawful use of a trade secret during the training process will depend on the relevant national implementation of the Trade Secrets Directive. This may vary between European jurisdictions, although a survey of our colleagues in Germany, France, Italy, Spain and Finland suggests that courts in their jurisdictions are more likely than not to hold that the relevant national implementation would cover an AI system provided as a service.

It is also important to appreciate that the Trade Secrets Directive only provides a required minimum standard of protection for trade secrets (i.e. member states are free to provide broader protection). In the UK for example, the courts have flexible and wide ranging powers to impose sanctions for misuse of confidential information and it would be unlikely to find any conceptual difficulty imposing an injunction in relation to an AI system which has been trained based on a misuse of confidential information. The UK courts would however consider carefully the degree of confidentiality of the training data and the extent of use of such data when assessing whether to grant an injunction and, if so, the duration of that injunction. Where the data is used for training but does not form part of the final AI system, the court would potentially grant a “springboard injunction”, limited to the time it would take someone starting from public domain sources to reverse engineer or compile the information.

What's the harm?

While the potential for an injunction preventing the commercialisation on an AI system will be at the forefront of most tech lawyers' minds, the potential for a third party to obtain significant financial compensation can also be a key issue. In Europe, while the IP Enforcement Directive and Trade Secrets Directive allow courts to award damages for elements other than economic factors (e.g. moral prejudice caused to the rightholder by the infringement), this form of enhanced compensation tends to arise far less frequently than appears to be the case with exemplary or enhanced damages under certain US statues (e.g. the Defend Trade Secrets Act). However, that’s not to say that European courts won’t award substantial compensation under certain circumstances.

The starting point for assessing damages in both IP infringement and trade secrets cases is the negative economic consequences, including lost profits, which the injured party has suffered or any unfair profits made by the infringer. Alternatively, the courts are also permitted to set the damages as a lump sum on the basis of elements such as the amount of royalties or fees which would have been due if the infringer had requested authorisation to use the intellectual property rights or trade secret in question. In each case, the court should focus on the actual prejudice suffered as a result of the infringement.

While these principles provide a common starting point for infringement of IP and trade secrets based rights in training data, their application will vary between countries, which can give rise to significant practical differences in the level of awards. The position in relation to the damages awarded for breach of contractual rights relating to training data can also vary significantly between European jurisdictions.

Practical risk factors:

In addition to the legal issues discussed above, from a practical perspective the following issues should also feed into the risk assessment:

  • Rights holder audits: if the data has been acquired from the rights holder under a data licence, does the licence provide the right holder with audit rights? Data audits are a common way for right holders to squeeze additional revenues by identifying use of licensed data outside the scope of the licence and threatening litigation if further payments are not made.
  • Possible data sources: are the right holders the only source of the relevant category of data? Where data is commercially available from a range of sources, the risk of detection will be lower than where the data has come from a sole provider in the marketplace.
  • Identity of the rights holder: commercial data suppliers will be keenly aware of the value of their data and are likely to be more aggressive in pursuing infringements than organisations less focused on commercialising data.

Conclusion

Assessing the risk of launching an AI system in Europe which has been trained using “tainted data” can be challenging as it involves balancing a number of legal and practical factors. However, identifying the relevant rights in the specific data concerned, understanding whether the breach is historic or ongoing and reviewing the possible sanctions in the relevant European jurisdictions in question can put you well on the way to understanding the risk profile.