Gartner defines ‘big data’ as “high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization”. This is the ‘3Vs’ definition of ‘big data’ and it certainly applies to patent data: the global patent data set today totals over 100 terabytes, with millions of new patent documents and updates made public every week, and the ever-increasing patent data volumes and variety show no signs of slowing down.

Big data has improved decision making in many industries as well as in the sports world, supplying a winning edge through better-informed decision making in baseball, Formula One racing, basketball, tennis and even golf. The PGA tracks eight characteristics of every golf shot with its ShotLink system and analysis of its data overturned the conventional wisdom of ‘laying up’ on the second shot of par fives in favour of getting as close as possible to the green. ShotLink now also enables real-time predictive analytics such as a player’s chances to make a birdie at every shot.

Big data and patents

Patent data is uniquely suited for big data tools and techniques, because of the high volume, high variety (including related information) and high velocity of changes. In fact, patents are leading the way with big data and analytics in many ways.

The patent space offers a fascinating insight into the potential of big data analytics, rich visualization tools, predictive and prescriptive analytics, and artificial intelligence.

– “Patent Analytics Solutions That Help Inventors Invent”, Outsell Inc, June 3 2016

Especially recently, big data tools and technologies are being used in several ways in the patent world to transform and improve patent analysis.

First, given the idiosyncrasies of patent data sets, there is great value in correcting and improving the patent data. Many big data experts have added veracity as the fourth ‘V’ to big data’s definition, to reflect the need for data accuracy and validity. Veracity is particularly important for patent data, because while some of the data is structured, much of it is human entered and includes many errors. In addition, some key data elements such as expiration dates are not provided in the data sets and important updates to the patent data, such as legal status and ownership changes, are in separate data sets altogether. In order to increase the veracity of patent data, these data issues have led to applications of big data tools and techniques for the following:

  • Correcting data – among other human-entered data fields, the ‘assignee’ (patent owner) is notorious for including bad data. As one example, in US patent grants over the past two decades, the assignee ‘International Business Machines Corporation’ was misspelled over 1,000 different ways within the US Patent and Trademark Office (USPTO) data sets. Big data technologies such as rules engines with machine learning and textual analysis can fix these errors and provide much cleaner patent owner information.
  • Extending data – in the patent world, updates to the patent data are in separate data sets, such as legal status changes and reassignments (patent ownership changes). To show the correct current status of a patent or application, these data sets must be combined and constantly updated into one ‘source of truth’ using big data processing techniques. Of even more interest is combining further-afield data sets such as patent litigation and financial data – which requires big data correlation technologies to accurately match the various entities in the data sets (eg, companies and law firms).
  • Calculating analytics – big data analytics generate algorithmic and quantitative inferences and correlations about data sets. For patents, a smart algorithm can calculate the expected expiration date of a patent, taking into account (in the United States) term adjustments, terminal disclaimers, legislative exceptions and so on. As other analytics examples, many examples of predictive patent strength and nearness algorithms can help to prioritise which patents to examine in more detail.

Second, big data is often used to analyse and predict the behaviours of participants in a system, such as shoppers, website visitors or athletes. For valid behavioural analysis using big data, it is important to collect and analyse as many different types of behavioural data points as possible and to compare to the entire population to find typical behaviours.

In the patent world, many behaviours can be analysed for participants such as patent filers, litigants, legal agents, patent examiners and inventors. For example, each patent application is a strong signal that the filer is willing to invest in protecting an invention that it feels may be commercially advantageous in the future. Even if the patent application is not granted or is abandoned later, this signal still retains value (as does an abandonment signal).

As one example, big data behavioural analysis in the patent data can reveal the likelihood of patent issuance during prosecution, based on diverse factors such as the number of office actions so far, the patent examiner who is reviewing the application and the patent agent who wrote the application. If the likelihood of issuance is low for a non-critical patent application after a certain point, this information could well be used by the filer to abandon the application rather than spending more money on it.

Finally, big data is also used to derive insights about aggregate information, rather than single data points. In fact, one tenet of big data is to ‘use all the data’, rather than using sampling or a subset. Techniques such as multivariate regression can then be used to find correlations, determine likelihoods, identify clusters and trends and even predict outcomes. The goal is not to determine absolute answers, but rather to provide directionally correct guidance that is helpful for making informed business decisions.

Analysing patents in aggregate, and comparing them via big data algorithms and analytics, involves the big data technique of analysing all available information. Most companies have an IP management system that keeps track of their own filings, but can tell them nothing about their portfolio strength relative to a competitor’s patents or which patents they might sell to fill in gaps in a buyer’s portfolio. Big data analysis of patents in aggregate provides an additional layer of insight and guidance that cannot be achieved through patent-by-patent analysis.

Patent portfolio management

Throughout the patent lifecycle – from the initial idea to prosecution, grant, maintenance and expiration – many decisions are made about which patents to invest in and how to manage the overall portfolio. The vast majority of these decisions are not legal decisions, but rather business investment decisions: how much and where to invest, where returns might lie in the future and which risks should be managed.

Big data-derived insights are especially helpful when working within business constraints – limited time, limited money or both. As one example, big data analytics are often used to prioritise which patents within a group to review in detail; this can reduce the overall review efforts and timeframe by two-thirds or more.

Big data analytics do not replace current decision criteria; rather, they augment and enhance today’s decision-making approaches by adding quantitative inputs that further inform the decision. Big data analysis of all available information also provides the additional context of your situation relative to all other patent filers, which can provide essential insights to guide patent portfolio decisions.

Within the patent lifecycle steps below, example decisions are given along with a big data analysis that is helpful in making that decision.

  • R&D investments – the proverbial ‘white space analysis’ and several other types of big data analysis can help to inform and guide R&D investments to yield defendable intellectual property. An important recent development is providing inventors with immediate feedback on their ideas using big data algorithms to let them know how unique the idea is, how they might refine and improve the idea to make it more unique and with whom they might collaborate inside their enterprise. In addition, big data analytics are very helpful to find out who is innovating in a technology, for potential R&D partnership opportunities.
  • Filing decisions – intellectual property is a multi-player game, especially in patents, where the basic right is to block an infringing competitor. Some patent filers are even specifying which competitors and products each patent application hopes to block in the future. Consequently, it is extremely useful to know the full context of each competitor’s situation, world view and future plans when making the investment decision to file a patent application. This applies even more so further into prosecution – after a couple of years, the landscape will have evolved and competitor applications from the same timeframe will by then be public, updating the context of your applications and informing decisions about whether to continue pursuing each one.
  • Choosing a prosecuting attorney – abundant data is available on patent grant rates, latency and office action counts by art unit, examiner, patent filer and prosecuting attorney. Forward-thinking patent filers are examining this information to guide the mix of where to send patent applications, based on their success rate and average latency and costs by art unit, normalised for patent complexity and other factors.
  • Handling office actions – when deciding how to respond to each office action, a predictive analytic can show the likelihood of success for each type of response based on similar prior situations, as in baseball or the PGA’s ShotLink predictive analytic for golf mentioned above. Interestingly, the likelihood-of-success analytic can be even more helpful when looking at decisions in the aggregate rather than one by one – for example, when prioritising which of 10 patent applications to pursue and which to drop.
  • Buying patents – when an offer for patents for sale is made, big data analytics can help an informed yes/no/maybe decision to three key questions to be made very quickly:
    • Are the patents stronger than my current portfolio in the relevant technology areas?
    • Do the patents cover new territory or just add another layer to protection that I already have?
    • Which are the 20% strongest patents in the selling portfolio I should look at in more detail?

Answering these questions in minutes can eliminate the vast majority of wasted effort in reviewing patents that are not helpful to your portfolio.

  • Licensing or selling patents – there are many considerations when deciding how best to monetise patents, but often the first two questions are: “Which patents should we license/sell?” and “Whom should we approach to sell them to?” Again, big data can be helpful in prioritising patents based on strength and applicability to other players’ patent filings, and also in finding out which players are patenting closest to the patents under consideration. When performing proximity analysis, often unexpected but high-potential players can be discovered in adjacent industries if true semantic analysis is performed – semantic analysis that utilises concepts rather than keywords or classification codes.
  • Renewal decisions – each renewal decision is an additional investment in a patent and can benefit from knowing the patent’s relative strength in your portfolio and versus your competitors, as well as from knowing the complete context of this patent’s relative position compared to the full universe of patents and applications.
  • Litigation strategies – when deciding offensive and defensive litigation strategies, big data analysis has uncovered the success rates of the opponent and the influence on successful outcomes of each law firm, court and even judge. These factors can help not only in making decisions on the margin in strategies and tactics, but also in informing the business about the likelihood of each outcome and the resulting costs and risks.

It’s all about business impact

By informing decisions at each patent portfolio management stage above with big data analysis, investments can be better targeted and value increased at every step. While each individual decision may be merely influenced at the margin by the analysis’s quantitative output, the cumulative effect over time can dramatically improve the overall value delivered for the investment, greatly increasing the value to the business of the overall patent portfolio.

Big data analysis is also essential to answering board-level questions about patent portfolios, such as the following:

  • “Are we ahead or behind of our competitors in this technology?”
  • “How strong is our protection in this country?”
  • “How does our portfolio compare to that competitor?”

Walking into a board meeting to address these questions with just a list of patents or a consultant’s written report simply will not do; but big data has the tools and proven capabilities to generate compelling answers to all of these questions and more.

Taking advantage of big data analysis may require new tools and skillsets for the IP management organisation. It can be sub-optimal simply to outsource analysis, because the IP organisation holds vital proprietary information and context, and will be essential to correctly interpret the analysis results. The best path to advance in this area is to select the right partner which can provide the improved patent data described above, along with skills training and a collaborative approach that focuses on your success in big data analytics.

Finally, many decisions made within the business have nothing to do with patents or the patent lifecycle, but big data patent analysis can still help to inform and influence these decisions. For example, when canvassing companies to potentially acquire, big data patent analysis can show which are more innovative and which have stronger IP protection in the desired technologies. Even a strategic vendor selection request for proposal could benefit from knowing which vendors are more innovative and protected in jurisdictions that matter.

Today, big data analytics can add a further layer of insight by analysing patent data that is improved by big data techniques, in aggregate and compared to the universe of patents and competitors. Big data patent analysis adds unique context and guidance that cannot be derived from anywhere else and can greatly increase the business value of decisions both within the patent lifecycle and in the business overall.

John F Martin

This article first appeared in IAM. For further information please visit www.iam-media.com.