In an excellent example of how to harness large data sets with AI whilst maintaining privacy by using blockchain, the Melloddy project has been launched. An acronym for 'Machine Learning Ledger Orchestration for Drug Discovery', the project is a collaboration between 17 organisations across Europe: ten pharmaceutical companies, two universities, four 'subject matter experts' and one AI company.

Melloddy seems to have found a solution to companies' concerns over protecting their proprietary intellectual property (e.g. in their chemical library databases) and the security of their data: blockchain technology allows decentralised data to be harnessed, without being initially pooled. Data is protected since it does not leave the owner's infrastructure. Security is ensured by using a private blockchain, which is designed so there is no central authority. All partners must approve any communication between the dispatcher and a ledger.

Melloddy then uses tools such as 'multi-task predictive machine learning algorithms incorporating an extended privacy management system, to identify the most effective compounds for drug development'. The project is seen as a pilot for further such collaboration with data which could have even greater benefits.

Indeed, machine learning can yield efficiencies at all stages of drug development, from target discovery, drug design, clinical trials through to patient treatment. The potential gains could be significant, given the current time and cost involved in bringing a drug to market (an average of 13 years and €1.9 billion, according to a 2016 paper in the Journal of Health Economics).

The project is coordinated by Owkin, which provides its blockchain architecture and has expertise using AI in medicine. Melloddy receives funding from the companies involved as well as the Innovative Medicines Initiative (IMI), a partnership between the European Union and the European pharmaceutical industry. 

By using “federated learning”, a decentralised type of machine learning, pharma companies can keep the data from their chemical libraries inside their own infrastructure.