This is the second part of a series of two on Generative AI (GenAI), where the opportunities and limitations of applying GenAI in the legal world are discussed. The first part of this series can be found here.

Opportunities and Limitations

Thanks to the enormous amount of data Large Language Models (LLMs) are pre-trained on, decoder-only language models, which belong to the class of GenAI, offer unprecedented opportunities to digest large documents and extract relevant information from them either as summaries or as answers to questions. Moreover, the ability to interact with AI chatbots gives one the possibility to refine their response and thus zoom into and investigate specific topics covered in the documents interactively without having to read them. Also, their creativity and ability to produce coherent text given just a few hints is without parallel.

However, this creativity combined with the elevated, sophisticated writing at the same time poses a severe drawback, as two New York lawyers, who used an AI chatbot to write a motion, learned the hard way. The chatbot included fictional case-law and after the court found out, it fined the lawyers for acting in bad faith.

This tendency to hallucinate is an intrinsic property of decoder-only language models which are precisely trained to be creative and not repeat themselves (which was a serious problem with earlier generative AI models that frequently got stuck in sentence-level loops). Their unmatched creativity also means that similar questions can lead to vastly distinct answers. And even slight updates of a model, or different query histories (i.e., chat histories) can lead to different answers to the same question.

An additional limitation of GenAI models stems from their size and corresponding hunger for data. They have grown so big, that they can only be deployed on supercomputer infrastructure, provided by just a few global players. This compute power comes at a significant price tag for businesses and users of these GenAI models. This might be a consideration if and which models to deploy for any given task. Also, the data hunger means that everything available on the Internet is considered fit for training, including the queries put in by users and channelled to the central servers. For this reason, several corporations banned or limited the use of a well-known chatbot, according to a recent article in the Economist. Meanwhile, global cloud providers offer enterprise subscriptions that contractually provide that any data processed will not be used for the training of the generative AI models.

What does this mean for the legal world?

In the legal world there is need for both convergent and divergent thinking and analysis, depending on the context and goals. A task that requires divergent thinking, such as contract drafting, is best started using a LLM from the decoder-only (GenAI) branch:

Example 1: Please draft me a contract for purchasing shares… Response: <SPA draft>

Additional example questions that may require divergent thinking are:

Summarise the document in 200 words for me.

What is the main point of the document attached?

Could you draft an issues list with 4 columns […]?

A task that requires convergent thinking, such as extracting specific information from an agreement, is best carried out using a LLM from the encoder-only branch (say, using extractive Q&A):

Example 2: What is the monthly net rent in the attached lease agreement? Response: 20.000 USD.

Additional example questions that may require convergent thinking are:

Does the attached document contain personal information?

How many shares have been issued under the agreement and at what price?

What is the claimant’s name?

What is the address of the lease property?

If the task of Example 2 is being carried out by a decoder-only model instead, answers may vary from time to time (even with the same input data and prompts, due to the generative character of these models). Also, answers always deliver a flood of additional information as GenAI is conversational by design. To mitigate errors and ensure consistency in quality of outcomes, appropriate safeguards and constraints have to be implemented through prompt engineering. Here are a few best practises provided by Databricks, a software company, to accomplish this:

  • Use clear, concise prompts, which may include an instruction, context (if needed), a user query or input, and a description of the desired output type or format.
  • Provide examples in your prompt (few-shot learning) to help the LLM to understand what you want.
  • Tell the model how to behave, such as telling it to admit if it cannot answer a question.
  • Tell the model to think step-by-step or explain its reasoning.

Besides prompt engineering, another means to improve and customize a model’s performance is to incorporate your own data through Retrieval Augmented Generation (RAG). In this approach, the most relevant pieces of information for the task at hand are first retrieved from a private document repository, using a search engine, and then injected into the prompt. This domain-specific context provides the GenAI model with additional information that can be used for a more accurate and contextually relevant response, and thus give you a strategic edge.

GenAI will certainly be an important and very helpful addition to the automation toolbox of the legal profession. Yet, it will only be one instrument among many. As with any automation, GenAI will need to be integrated into the workflows to unlock its full potential.