What is big data?

Big data is both an element of and a trigger for the ongoing digital revolution. The classical (scientific) principle of causality that is all about why things happen, shifts into smart predictive analysis about whether and when things happen – a change of paradigm driven by commercial pragmatism. Big data gives decision-makers access to smart intelligence for a dramatic reduction of the risk of getting things wrong. 

The raw data are like an ever-revolving block of marble, and big data analysts and analytics are the tools and the stonecutters carving fine sculptures out of it. When cutting a raw block of marble, nobody knows what is in it, and yet there is potential for endless variation, creativity and for the most precious results. 

Contrary to classical collection and analysis of data, with big data one has little or no idea of the ultimate purpose and use of the data when collecting them. It is the sheer mass of data that creates the flexibility to think about purpose later on.

Big data stands for large collections of data that cannot be processed with traditional applications. Some characteristics of big data are that it:

  • comprises large, dislocated and complex sets of data to an extent that traditional data processing and analytics fail;
  • contains various types and structures of data in various places, which is processed continuously over longer periods of time;
  • contains data of various origins and multiple sources and pools, not necessarily with any inherent connexion or link;
  • can be personal data related to individuals, anonymous or any combination thereof;
  • needs superfast, high-end processing to turn predictive analytics into tangible results;
  • is largely unrelated to and independent of predefined usage patterns; and
  • is difficult to evaluate commercially at the outset because nobody knows what is in it.

Big data – why now?

The hype around big data is relatively new and driven by two recent and simultaneous developments: the amount of accessible digital information generated on the back of advanced mobile devices, sensor technology creeping into all areas of life (including cars, mobile devices and M2M communication in industrial production) and large-scale storage facilities growing at a multiple of the growth of the world economy; and the processing power of computers exploding while accessibility of superfast processing power is being boosted by a fall in price and rapid growth of cheap and large bandwidth network connections. Hence, it is easier than ever to process big data. 

Who owns big data?

Big data have emerged as a driver and production factor in the commercial value chain across virtually all industries. Yet even the most basic legal categories, including ownership or the right to generate, use or monetise big data, are yet to be established. Various legal concepts including data privacy, database rights, IP rights, antitrust law as well as the basic civil rights of ownership and possession are playing a role when dealing with the legal alien big data but are each only addressing bits of it. It is even questionable whether ‘ownership’ is the right way at all to capture big data in legal terms.

Demystifying data privacy

To start with it probably makes sense to deal with and demystify data privacy in big data. Data privacy is a predominant theme in almost any legal approach to big data. However, data privacy is not designed to legally and/or commercially allocate ownership or the right to use data but to exclude, protect and restrict access to personal data. 

Data privacy is like a parallel universe of rules and regulations that apply to processing any kind of personal data. Whether or not personal data are part of a big data pool does not make a difference or limit the scope of the application of data privacy laws. However, similar to the infectious ‘copy left’ effect of certain embedded open source software, personal data in a large pool of raw data typically and substantially limits commercial exploitation of big data. 

Hence, meticulous depersonalisation is a success factor in dealing with big data. The application of smart analytics on large data pools creates an inherent threat of tracing down individuals from what is supposed to be ‘anonymous’ or depersonalised data. Big data technologies have therefore also lifted the algorithms and methodologies of smart depersonalisation to the next level.

Legislation around ownership

Even for fully depersonalised big data – where privacy law is irrelevant – ownership remains unresolved. Statutory law and regulatory frameworks are broadly agnostic on ownership of data. In terms of personal data, the right of the data subject (ie the individual whose data are being processed) under data privacy laws to largely exclude others from using ‘their’ personal data leaves little room for a meaningful discussion on ownership. 

There is still no conclusive outcome of the legal discussion and legislation around allocation of ownership of anonymous data or at least of the right to use and to exclude others from using such data. Arguments indicate, though, that the entity or individual controlling the production of anonymous data should be entitled to use them.

A clear move, however, has been made by antitrust authorities looking into remedies for dominance cases and competition concerns on certain merger clearances. Access to specific data even of competitors has been considered a relevant and sometimes even decisive factor for doing business. With the increasing relevance of big data analytics for various industries, we are likely to see antitrust authorities intervening by enforcing access to big data presumably way before legislators will finally address the issue of ownership in big data more widely.

Smart contracting – a way out of the dilemma?

Big data has developed into a game-changer for many businesses and commercial opportunities; still legislators, courts and regulators apart from certain exceptions are lagging behind on establishing a solid and unambiguous legal framework for big data. Legal uncertainty still keeps a lot of businesses away from mining into big data, the new gold of the 21st century. 

However, there is a simple tool to help getting out of the dilemma. In most jurisdictions smart contractual arrangements between customers and commercial entities as well as between industry players can largely mitigate the downsides of a dissatisfying judicial uncertainty. Without clear statutory rules on ownership and/or the right to commercially use big data, individual contractual arrangements are instrumental to secure investments and business models around big data. 

Simple measures like anchoring in standard terms and conditions the right to use even anonymous customer generated data or B2B intelligence for commercial purposes of the data processor can substantially mitigate the risk of an adverse commercial impact of legislative action later on.