Big Data is the term popularly used to describe the commercial aggregation, mining, and analysis of very large, complex and unstructured datasets such as images, videos, music files, and files based on social media and web-enabled workloads. In other words, everything from your Facebook posts and tweets through to your late-night shopping habits can now be analysed to put together a comprehensive (and eerily accurate) picture of you as a consumer and target for advertising.

The data used is rich in personal information, but until recently has been difficult – or at least, prohibitively expensive – to understand and analyse collectively. Now, the explosion in cloud computing and software interoperability has provided the opportunity to analyse this data – and indeed, to monetise it. Commercial research organisations can now take advantage of Big Data processing techniques with a relatively low investment in technology.

The result is a huge market opportunity for online businesses to engage with their customers more coherently than ever before. Data has been key to connecting businesses with customers in recent times through behavioural advertising or strategies relating to product development. For example, Amazon are not alone in using collaborative filtering technology to develop automatic recommendations for customers, based on their purchase history. Retailers have been some of the first to adopt Big Data technology. Tesco has recently announced the launch of a new digital TV and movie service for its loyalty card users, offering the service free of charge by selling loyalty card users’ data, i.e. information about purchasing habits, to allow advertisements to be made to consumers.

The development of Big Data and its application for commercial means as The Next Big Thing, inevitably a large number of potential legal issues are raised. Many of these issues are familiar on some level, but will require careful re-evaluation in today’s privacy-conscious world. Companies that wish to make use of this technology must address these legal issues head on, before integrating Big Data analysis and technologies into their businesses processes. Below are some of the issues at the forefront of discussions which need to be considered:

1. Data Protection:

Perhaps the most widely discussed barrier to the commercial development of Big Data is that of data protection. Regulatory requirements, particularly in the EU, dictate that personal data must be processed for specified and lawful purposes and that the processing must be adequate, relevant and not excessive.

The potential problems related to privacy are evident and obviously highly important, given that current data collection methods and policies do not appear to be sufficient to address all the specific data processing deriving from Big Data technologies. The same cross-referencing and superimposition of data which, in all likelihood, has been collected for other purposes, raise complex questions from the viewpoint of the completeness and correctness of data protection policies – not least because some of the data will have been collected at a time and under a policy in which Big Data analysis was not contemplated. Each Big Data project will inevitably be different from any other and customised data protection policies must be considered for each case.

A potential way around this concern is to annoymise data. The attraction of anonymised data is that it’s no longer personal data and therefore falls outside the scope of the Data Protection Act 1998. However, ensuring that data is properly anonymised, and not just masked, can be difficult to achieve in practice, and much of the value Big Data of course lies in the potential to target advertising to specific individuals.

2. Intellectual Property Rights:

Perhaps just as fundamental is the necessity to carry out a detailed analysis of who owns the legal rights in the input data being collected and examined and if there are any copyright, database or other intellectual property right aspects to be considered. Companies must first establish whether such data can be analysed in the first place without infringing third party rights, and then consider how best to protect the new work created. Where data is not owned or licensed then companies will need to consider carefully whether they can rely on any relevant exception to copyright to be able to use and process such data. One can easily see a gathering storm, between data owners on the one side and technology providers on the other, as complex arguments relating to ownership, licensing and exceptions to copyright are currently being rehearsed.

Disputes as to who owns the output data may also arise, particularly in cases where third parties are involved in developing systems to be used.

3. Confidentiality:

Tied closely with the data protection considerations, the confidentiality of the information which is accessed and analysed is highly important, given both the sensitive information which is ‘fed in’ to the process and the aggregated product produced as a result. Businesses will need to be transparent on how the information is handled and protected, and be able to demonstrate that confidentiality is maintained at all stages.

4. Contractual liability:

A further significant risk is that, liability could arise out of reliance by third parties on the analysis produced, for example in circumstances where the output is based on inaccurate or incomplete information, or where expected correlations do not emerge. Careful thought will need to be given as to what extent (if any) the accuracy and value of output data can be warranted by the supplier or relied upon by companies looking to exploit the data.

5. Competition Law:

Finally, it is possible that the technology opens up the possibility for abuse of information obtained in relation to competitors in the market (perhaps unintentionally) that would then itself give rise to a raft of competition law considerations.

As ever with new technologies, it is difficult to foresee all of the issues that will arise in practice. As this technology continues to develop and new applications are found for the data produced, novel questions are likely to arise that may not always be straightforward to answer.