My colleague Sarah Lau and I have recently published an article with an overview of AI and patenting strategies and issues in the life sciences (copy available here). So, I thought it would be interesting to take a look at some patenting activity in this area. Conveniently, the IPC and CPC patent classifications (used by all major patent offices) has a class for this sort of thing, G16B40/00 relating to “ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining”, with sub-classes G16B40/20 for supervised learning and G16B40/30 for unsupervised learning.

Here is a chart of the published EPO application from 2017 to 2021:

The first thing to note is that filings (and hence, with a bit of delay, publications) have increased significantly, although this is still a small area of patenting activity. The second thing we can spot is that there is a lot more supervised technology (things like classification, regression and prediction) than unsupervised technology (things like clustering) being filed for. That is if the patent offices have correctly classified this distinction. That is something I might look at in detail with a bit of spare time one day, but in the meantime, here are two word clouds from the abstracts of the applications in the above charts, one for G16B40/20 (supervised) and one for G16B40/30 (unsupervised):

G16B40/20 (supervised)

G16B40/30 (unsupervised)

Looking at the word clouds, they look pretty similar, and the fact that "clustering" shows up strongly in both makes me wonder how well supervised and unsupervised techniques are in fact separated in these classifications. Something to look into in an idle moment…

Regardless of how well the classification distinguishes supervised and unsupervised techniques, the two classes combined promise to be an effective way to get an overview of patenting activity in this area. While the 2022 numbers are, of course, only partial, so far, about 150 applications have been published this year. Looking through them, some interesting ones pop out, for example, a patent to Deepmind’s protein folding prediction that made headlines recently (EP4018449A1 - PROTEIN STRUCTURE PREDICTION FROM AMINO ACID SEQUENCES USING SELF-ATTENTION NEURAL NETWORKS) and patents relating to drug discovery (EP3997714A1 IDENTIFYING ONE OR MORE COMPOUNDS FOR TARGETING A GENE, for example). For those interested in exploring this part of the CPC to learn more about patenting activity in this growth area, this simple patent search is a good starting point (not limited to any jurisdiction or time frame). By toggling the filter function on the results screen and clicking on the chart symbol, one can access a panel with interactive statistics and graphs to explore the data. Enjoy!