The term ‘big data’ is one which is now in almost common use in many organisations and the world of business generally. While the term means different things in different contexts, in this article (and at its most basic), it refers to the significant amount of digital information collected by businesses from their provision of products and services to customers and the subsequent use of those products and services by customers together with other sources of data (for example, publically available data sets; data sets available from third party suppliers).

The question that then arises is how can such data be used to benefit the organisation and its customers? This is where data analytics enters the field.

There are many areas in which broad and detailed data sets can potentially provide incredibly useful insights to organisations, such as:

  • the undertaking of more effective marketing of products and services through tailoring to specific customers, for example by identifying certain categories of customers where data analysis suggests such customers are more likely to accept specific product/service offers than other categories.
  • the development of new and better products and services, increasingly tailoring these to the individual customer with the aim of providing greater convenience and closer links to the customer.
  • the identification of longer term trends, enabling earlier preparation for potential strategy adjustments or more radical changes.

To do this though, there is a need to convert large quantities of data, through qualitative analysis, to provide relevant business insights. In short, data analytics results in the visualisation of previously unidentified correlations and patterns. Further, as significant amounts of data is often based on convenience samples or subsets, analysis is required to adjust for any biases, to remove ‘false positives’ (where the data suggests a specific causative effect which on closer examination is false) or to provide missing context.

For example, I argue that Sir Richard Hadlee is one of the best bowlers in test cricket of all time. Using available statistical data such as bowling average, the number of wickets taken, the average number of deliveries required for each wicket and the average number of wickets taken per test, there is a strong argument. However, the fact that he played a significant number of tests at home in New Zealand at a time when the wickets greatly assisted bowlers is also relevant. As is the fact that some of the New Zealand slip fieldsmen were undoubtedly very good. Also, the overall calibre of batsmen for the decade or so following the late seventies, with the probable exception of the West Indies, was weaker than the overall calibre of batsmen from 1995 onwards (Lara, Tendulkar, Ponting etc). So, analysis is not an easy task.

Data analytics in the context of big data also raises a number of legal and risk issues.

First is, unsurprisingly, privacy considerations. To the extent that relevant data is personal information for the purposes of relevant privacy regimes, compliance with applicable regimes will arise and can significantly impact on the ability to undertake data analytics (for example do... privacy consents contemplate such a use? Do consents effectively provide for third party suppliers to undertake services on behalf of the data collector? If cross-border access/disclosure is contemplated, what requirements arise?)

A related issue is whether the use of anonymised data or specific items of data which are not themselves ‘personal’ is sufficient to fall outside privacy parameters. This is an issue privacy regulators are facing and will increasingly face, and the answer is not necessarily simple. For example, a recent decision by the Privacy Commissioner in Australia held that information such as IP addresses and specific cellular tower was personal information of a consumer of telecommunications services because of the ability of the service provider to aggregate that data with data in other systems which provided the necessary personal identification. Note the service provider is reviewing this decision given its potentially wide impact. With a trend towards privacy regimes including substantial requirements and penalties for non-compliance such as mandatory notification of data breach; significant fines; and, in some jurisdictions, imprisonment, the impact of privacy regimes cannot easily be avoided.

Further, could an organisation’s use of data analysis to provide specific products or services, or specifically priced products or services to some categories of consumers and not others, lead to claims actionable under anti-discrimination laws? If so, do any statutory or other exemptions apply (for example, discrimination legislation may allow insurers to act in a manner that would otherwise result in a breach based on actuarial or other statistical data – what standard is required to meet any such exemption)?

Also, data security considerations arise. While data security and cyber-resilience are matters for all organisations, if a third party data analytics provider has been engaged and personal information or confidential information is required to be (or more likely is inadvertently) provided, and then is subsequently disclosed by or accessed from that service provider, what legal or contractual obligations apply to the third party? How is risk allocated under the relevant contract?

Finally, what impact does the use, or potential misuse, of data analysis outputs have on an organisation’s reputation/brand? This issue overlaps with some of the legal considerations set out above – an organisation being held liable for a failure to comply with legal obligations is rarely brand enhancing. But for boards and senior executives, this is a much broader and potentially more damaging issue as well. As organisations move to increasing categorisation/targeting of their customer interactions, and differentiated products, services or processes are provided, one person may see convenience; another may see inappropriate business over-reach. In this respect, gauging when and how to use data analysis is a significant, and ongoing, challenge.