Our recent post “Tracking AI Prosecution Trends at the U.S. Patent Office” presented USPTO data which suggests that future prosecution of AI inventions may be less focused on patent eligibility under 35 U.S.C. §101 and more focused on the traditional requirements of §§ 102, 103 and 112. This post is the first of a two part series looking into the challenges that AI inventions present to one of these traditional requirements: patent disclosure under 35 U.S.C. §112(a). In this Part I, we identify the unique disclosure issues with AI inventions. In Part II, we provide practice tips for describing and enabling AI inventions.
A fundamental premise of most patent systems is the quid pro quo by which an inventor discloses his or her invention to the public in return for exclusive rights to use such invention for a limited time. Recent advances in artificial intelligence (AI) have sparked debate as to whether current patent disclosure requirements can enrich the public with AI inventions such that the granting of the exclusive right is justified. This debate inevitably centers on the “black box” nature of a particular type of AI: machine learning. Machine learning is the dominant AI technique disclosed in patents. As such, understanding the patent disclosure issues presented by AI inventions requires an understanding of the basics of machine learning.
Basics of Machine Learning
It is well known that machine learning requires first “training” a system using a training data set, and then deploying the trained system to “infer” predictions from new data not previously seen by the system. The training process attempts to correlate the input data to some desired output prediction or classification. The distinction between different types of training such as “supervised,” “unsupervised,” and “reinforcement” learning generally relates to the level of human intervention in the training process, and more specifically to the meaning that humans give to the training data prior to training of the machine learning system. In supervised learning, the training data set is carefully analyzed by domain experts to identify features that are relevant to the desired output of the system, and to label each sample of the training data set as one of the possible target predictions of the system. By contrast, unsupervised learning techniques are described as not providing features and labels for the input test data in advance, but rather the machine learning algorithm itself identifies features of input data that permit segregating of the input samples into unlabeled output groups. Unsupervised learning techniques are less developed than supervised learning techniques, which are required to some degree in nearly all current day implementations of machine learning. For example, the output of a purely unsupervised learning algorithm may be used to define the feature set for a supervised learning system in which experts provide target labels that render the output groupings meaningful to humans.
The machine learning training phase begins with a collection of training data from a particular domain in which the machine learning system will be applied to solve a real-world problem. A common example would be a training data set including many images of different types of fruit (apples, oranges, bananas, etc.). The training data typically includes an identified set of features shared by all input samples and relevant to the output, as well as the possible labeled output predictions to be made from values of the feature set in a particular sample. In the fruit example, the features may be shape, color size, etc., and the possible labeled outputs may be apples, oranges, bananas, etc.
A machine learning algorithm can be thought of as a very generic mathematical function that roughly correlates any input data with possible output predictions. When the algorithm is run on training data, it uses the value of each predefined feature in a particular sample to associate that sample with one of the predefined possible predictions that can be made about the sample. The algorithm then compares its selected prediction to a human labeled correct prediction for the sample to determine the accuracy of the algorithm’s prediction. As may be expected from the generic nature of the mathematical functions that underlie machine learning algorithms, the error rate on initial samples of the training data is more akin to a guess than a prediction. However, all machine learning algorithms have a set of basic parameters that can be adjusted to improve accuracy in selecting the correct output prediction. With each erroneous prediction, the algorithm adjusts one or more of these standard parameters to lower its error rate.
The above process of guessing, determining error, and adjusting standard parameters to reduce the error rate is repeated with many more samples of the training data until the machine learning algorithm finds the optimal parameters that represent the complex relationship linking the features of the input data to the correct output prediction. This optimization performed during the training process transforms the machine learning algorithm into a machine learning model. The model is the thing that is saved after the training process and deployed in a system to perform a real-world task. Returning to the fruit example, the trained model can be deployed in a system that receives new unlabeled images of any type of fruit and accurately predicts whether the image is an apple, orange, banana, etc. The model does this by applying the complex patterns “learned” during the training phase to the new image data.
Some understanding of the operation of the trained machine learning model can be gleaned from the familiar visual of an artificial neural network (ANN), which is a particular type of machine learning model. ANNs include a complex web of interconnected computing units called neurons that are grouped into discrete layers. The connections between neurons of the network are essentially statistical weighting values that represent the importance of each of the input features to predicting the correct output. In complex “deep learning” neural networks, the feature weighting values are buried within many neuron layers hidden between input and output layers of the model. Each sample of input data to the ANN traverses the neural network based on the feature values of the particular sample and the weighted connections between the neurons. This traversal results in an accurate prediction that the particular sample is associated with one of the target outputs of the system.
Unique Challenges with Disclosing Machine Learning Inventions
It is often said that machine learning is “automated programming.” That is, the machine learning algorithm (a human written program) inputs data from a particular problem domain and outputs a model (a computer written program) that can solve problems in that domain. But if machine learning merely automates the human activity of programming, why would it be more challenging to disclose machine learning inventions? After all, manually written software programs have been described in patents for many decades. A better understanding of these disclosure challenges can be gained by considering the distinction between the algorithm and the model of machine learning systems. At this point, it should be noted that the terms “algorithm” and “model” are often used interchangeably throughout the literature on machine learning systems. For example, the result of the training process is frequently described as simply a “trained algorithm.” To facilitate understanding of the unique disclosure challenges, we adopt the description of machine learning systems set forth above in which the machine learning algorithm is transformed into a trained model during the training process. This distinction in terminology will also be useful in Part II of this series because the term “algorithm” has developed some legal meaning in U.S. patent law, whereas the term “model” generally has not.
Machine learning algorithms have been the subject of academic research for many decades. Although the underlying mathematical functions of machine learning algorithms are quite complex, the training process is fairly well understood. This understanding has developed mainly through refinement of the training process into the granular steps involved in building relatively simple models having few input and output variables. Several descriptive tools have evolved to explain these training steps. For example, academic research papers often use flow charts, mathematical formulas, and pseudocode to describe the training steps. Indeed, our understanding has come to the point where software developers can implement machine learning algorithms in the source code of many modern programming languages. Thus, the machine learning algorithm itself is akin to run-of-the-mill software programs outside of the AI context.
By contrast, the machine learning model is the “black box” aspect of AI systems. As stated by IBM, the largest patent filer for AI inventions in the U.S., “AI inventions can be difficult to fully disclose because even though the input and output may be known by the inventor, the logic in between is in some respects unknown.”  One reason that the model is considered a black box is the enormous complexity of present day models such as deep learning neural networks. As noted, the model is essentially a large set of numerical statistical weighting values that represent the complex interrelatedness of many input features of the training data that determine the output prediction. However, these numerical weighting values (which may be produced in a data table, for example) have little meaning to even experts because the magnitude and sign of the numerical weightings are randomly produced during the training process.
In addition, complex models are a relatively recent artifact of machine learning research and have not themselves undergone much study. Although the training algorithms that produced relatively simple ANNs a decade ago were theoretically capable of building more complex models, it is only through recent advances in computing power and the availability of large datasets that complex ANNs with hundreds of hidden layers have become a reality. Tools are still being developed for understanding the decision making process that occurs within such deep learning models. Thus, while machine learning algorithms can be described with source code precision, the complex models built by these algorithms are presently described by vague analogy to other little-understood systems like biological neural networks.
Finally, some of the precise inner workings of the machine learning model may simply be beyond the limits of the human brain to comprehend. These models represent the patterns discovered by automated iterations through massive amounts of information. As the number of input features of these models increases, the feature interrelatedness that provides the path to an accurate prediction may be undetectable or imperceptible to humans. Evidence of this can be seen in the area of computer vision where it is well publicized that machine learning models have become more accurate than humans in recognizing and sorting images of some objects.
Possible Enhancements to Disclosure Requirements for Obtaining AI Patents
The patent policy debate noted at the outset of this post more precisely centers on enhancing patent disclosure requirements to mitigate the black box nature of machine learning models. For example, some scholars have called for long term legislative changes to establish a data deposit requirement for training data and/or the machine learning model itself- akin to the sequence listings or biological material deposit requirements found in life sciences patents. Others suggest that the USPTO can provide these and other disclosure enhancements through patent examination rules and policies. However, a recent USPTO report on AI and patent policy suggests their view that no adjustments are needed to current disclosure laws or examination policies. Nevertheless, even without legislative or regulatory changes, courts may interpret existing patent law doctrines as requiring a greater level of disclosure for AI inventions. This raises the more immediate practical question of how patent practitioners can meet current U.S. disclosure requirements in view of the challenges with describing AI inventions. In Part II, we will discuss techniques for drafting AI patents in compliance with the written description and enablement requirements of 35 U.S.C. §112(a).