Legal industry media would have us believe that lawyers are lamenting the rise of artificial intelligence and how it will upend the legal industry. But with artificial intelligence implementation comes artificial intelligence contracts. At least for the moment, these contracts are being drafted by humans.

Much of artificial intelligence deployment which is customer facing, whether it be a legal research tool or a telecommunications customer experience assistant, is chat-based. The information exchanged between an AI and a prospective or existing customer by chat is steered either by a cognitive virtual assistant or a human assisted virtual assistant. This is all done by employing software capable of chat automation and machine learning capabilities. This experience can over time become very sophisticated, in respect of offering solutions to complex questions by way of web-based chat conversations.

Chat can be deployed in cute and disarming ways, which keeps customers happy. But chat is ideal for machine learning because it is rapid, discrete, unstructured, and in text. These factors mean chat is capable of being assimilated by an AI and otherwise data-mined. The owners/licensors of this sort of AI capability want the raw, unstructured data to assist the AI to learn. Further, licensors want the express right to use the raw unstructured data in any and all future deployments of the AI.

1. De-identification

If that raw data relates to sales preferences by teenagers in Fremantle around clothing purchases, then retention of the unstructured chat data is not especially contentious. A bigger problem is if, for example, the raw chat data contains personal medical information or becomes a medical record. What if a person asks a cognitive virtual assistant by chat about upping their olanzapine dosage (Olanzapine is used for the treatment of schizophrenia and bipolar disorders)? The Australian Privacy Principles contained in Schedule 1 of the Privacy Act 1988 immediately become a live issue, and the parties need a the sorts of clauses we see in privacy policies to address this. Another scenario involves questions and answers put to a legal research cognitive virtual assistant which are the subject of legal professional privilege. Is privilege waived by way of raw data retention? Another issue is in respect of defence secrets, which carry custodial sentences in respect of data breach. There are many regulatory environments around data which can present problems for both licensor (in the retention) and licensee (in the transmission).

To deal with those sorts of concerns, raw data derived from AI chat conversations should be, and is ordinarily, thoroughly de-identified – and by a back-end script, not a human. Contracts involving raw data for consumption by an AI should address these issues and expressly require de-identification of unstructured data, for both the benefit of the licensor and the licensee.

2. Ownership of the Database

The next issue lies in ownership over that de-identified raw data. It was established in IceTV Pty Limited v Nine Network Australia Pty Limited (2009) 239 CLR 458 and in Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCAFC 149 that copyright does not subsist in a computer aggregated database. A database is not an original literary work. This is because for a database to attract copyright, the database must be the result of human authorship and not computer-generated.

This presents obvious problems. If an agreement provides that the licensor of the AI owns the aggregated raw data, then that aggregated raw data is a database generated by a computer and almost certainly does not attract copyright. What is there for the licensor to own? One solution to this is to have the agreement governed by the laws of the European Union, because EU Directive 96/6/EC provides for legal protection of databases. Another is to try and contract the parties into a de facto recognition of sui generis property rights in the database. Both options come with problems.

3. “Garbage in, garbage out”

The final issue (at least, the final issue tackled in this first part of the article) is in respect of policing data input and output. It is worth noting that the AI cannot usually unlearn. As Derek Partridge notes in his book, A New Guide to Artificial Intelligence, “unlearning is such a distasteful idea for AI systems builders that it is hardly thought of at all.”

This applies to the AI’s cognitive paths shaped through the machine learning exercise, but it does not apply to data. Data can be removed. There are good reasons to do this. AIs learn from having access to data describing human interaction. No one wants the AI to pick up bad habits including the adoption of bad information provided by a human who did not have coffee the morning he or she engaged in an inaccurate chat conversation with an existing or prospective customer. Practically speaking, this is policed by a human: if there is a (variable) degree of uncertainty attaching to a question or to a draft response, the AI will flip so as to become a human assisted virtual assistant. (Typically, the more exposure the AI has to a data set – “This is a duck. This is also a duck. This is not a duck.” – then the lower the degree of certainty.) But in any event, the agreement should provide something along these lines

The information provided to the AI, including by way of a human to human interaction:

(i) is complete, accurate and reliable; and

(ii) is not, in any way, incomplete, false, misleading, deceptive or fraudulent.

Also, no one wants an AI which has learned to be mouthy.

The Users, in using Chat, will not use language which is:

(i) defamatory;

(ii) offensive;

(iii) reasonably construed as harassment;

(iv) scandalous;

(v) threatening; and

(vi) otherwise contrary to law.

There is the related issue that the AI might have brought to the task a bad habit learned from a previous deployment. In addition to learning profanities, what if the AI has picked up information from a previous deployment which constitutes for example a misleading and deceptive representation? The contract’s drafter should try and accommodate this possibility and mitigate the risk attached to it “to the maximum extent permitted by law”.