Article for April 2014 iappANZ Bulletin

One open issue in privacy regulation is appropriate regulation of ‘big data’ analysis of customer transactional data.

The issue can be simply stated. Good privacy practice requires minimisation of use of personal information. Often the most effective way to minimise use of personal information is to remove identifiers and then use only anonymised but still disaggregated, transaction level data (for example, stripping the customer identifier and using a random number instead). Many valuable business data analytics tasks can be effectively performed using anonymised transaction level data. The use of effectively deidentified information should be facilitated by privacy regulation. Effective deidentification obviates privacy and security risks that otherwise are part of use of personally identifying information for data analytical applications. But at the same time as conducting such anonymised transaction level analytics, most organisations continue to collect and maintain customer records, including personally identifying information, to facilitate the organisation’s dealings with individual customers.

Can an organisation effectively separate the two streams of activity?

Should the fact that an organisation maintains any database of personally identifying information infect the organisation’s handling of deidentified information, such that this deidentified information must be treated as personal information and managed in accordance with the APPs?

The privacy analysis is relatively straightforward in the context of de-identified information being exposed to third parties or even put into the public domain. Personal information that has been deidentified is no longer personal information. Personal information is de-identified ‘if the information is no longer about an identifiable individual or an individual who is reasonably identifiable’. Privacy regulators in jurisdictions that have considered this issue in any depth conclude that the analysis of whether personal information has become effectively de-identified must be considered looking at the particular facts and circumstances, or as it is sometimes put, ‘contextually’ or ‘in the round’. The Australian Privacy Commissioner’s recent Australian Privacy Principles Guidelines cite by way of example a variant of the frequently cited Governor William Weld example: information that an unnamed person with a certain medical condition lives in a specific postcode area may not enable the individual to be identified and would not therefore be personal information. By contrast, information may be personal information if held by an entity or individual with specific knowledge that could link an individual to the medical condition and the postcode. The Guidelines sensibly suggest (at paragraph B.86) that whether an individual is ‘reasonably identifiable’ from particular information will depend on considerations that include:

  • the nature and amount of information;
  • the circumstances of its receipt;
  • who will have access to the information and other information either held by or available to the APP entity that holds the information;
  • whether it is possible for the individual or entity that holds the information to identify the individual, using available resources (including other information available to that individual or entity). Where it may be possible to identify an individual using available resources, the practicability, including the time and cost involved, will be relevant to deciding whether an individual is ‘reasonably identifiable’; and
  • if the information is publically released, whether a reasonable member of the public who accesses that information would be able to identify the individual.

But the privacy analysis ceases to be straightforward when the consideration shifts to the effectiveness of separation of two streams of activity within an organisation, one being management of, say, a customer relationship management database and the other being data analytics activities using deidentified transactional data.

Consider the Commissioner’s statement in the Guidelines (at paragraph B.87), as follows: “Even though it may be technically possible to identify an individual from information, if doing so is so impractical that there is almost no likelihood of it occurring, the information would not generally be regarded as ‘personal information’. An individual may not be reasonably identifiable if the steps required to do so are excessively time-consuming or costly in all the circumstances.” This statement appears to look only at infeasibility or impracticality of re-identification. It does not appear to consider whether technical, operational or contractual safeguards are likely to be effective to ensure that reidentification will not be attempted (or inadvertently facilitated) within an organisation in the course of, or in any way following, that analysis of deidentified data. In other words, the statement does not appear to focus upon unlikelihood through an organisation taking steps to ensure re-identification will not occur, even while it remains feasible. The statement also proposes what appears to be intended to be a very high hurdle - “almost no likelihood of it occurring” - that does not have any accepted legal meaning and therefore is not readily translatable into operational recommendations.

If the Commissioner’s statement in the Guidelines is taken on its face, many data analytics activities within Australian corporations today that use deidentified transactional data are regulated as uses of personal information because re-identification remains a theoretical possibility, albeit remote. This would be the case notwithstanding the extent of safeguards that are deployed within the corporation to ensure that re-identification will not happen and notwithstanding any contractual commitments given by a corporation to its customers about how their personal information will (only) be used. In short, this approach would create a problem today for the many Australian corporations that conduct transactional data analysis for everyday business activities as diverse as logistics, stocking and stock placement, let alone new data analytics applications such as one-to-one marketing or other targeting of individual customers based upon knowledge of the individuals’ identity.

There is a significant risk here of regulatory over-reach. Any overreach would retard beneficial applications of customer data analytics and increase the disadvantage of offline retailers seeking to compete with online retailers. How? Most offline businesses know very little about their current and prospective customers. Offline businesses cannot reliably measure their customers pre-buying activity and therefore don’t know whether customers are responding to particular advertising or marketing initiatives. Offline businesses don’t know anything at all about prospective customers browsing and comparison of products before they make their in store purchase. Insights derived from data analytics of offline transactions address this competitive disadvantage. Use of these insights can facilitate business optimisation, substantially reduce costs of doing business and product wastage. For example, insights based upon store traffic data by time of day, local demographic data, weather predictions and scheduled local events can help retailers and their major suppliers better anticipate customer requirements and by so doing so get the right products in the right quantities just in time to the right stores. Retailers may target promotions and one-to-one offers to customers based upon their anticipated needs, in the same way as major online retailers.

 So what is the appropriate regulatory response? Of course, ‘big data’ retail analytics rightly gives rise to concerns about consumer privacy and excessive surveillance of individuals. These concerns are magnified when big data analytics is not understood by consumers and the processes are not transparent. But customer data is the most valuable commercial asset for most retailers. Any breach of trust between a retailer and its individual customers about uses of personal information can irreparably damage a business brand. Regulation does not need to deem personal information where it does not need to be regulated: it is the effectiveness of the barriers between handling of personal information and uses of deidentified data that should be examined, not uses of deidentified data. Data scientists conducting customer data analytics needs to be encouraged to engineer in, through ‘privacy by design’, effective technical, operational and legal safeguards that embed a high level of privacy protection for individuals. The way to do this is not to extend the net of regulation by expanding what is personal information to cover all uses of deidentified but still disaggregated transaction level information within an organisation. Instead, regulation needs to create the right requirements and incentives for organisations to effectively, sustainably and demonstrably quarantine or segregate analytics activities using deidentified transaction level information within and outside the organisation from all possible uses and disclosures of personal information.

The Commissioner’s revised Privacy business resource 4: De-identification of data and information suggests a more nuanced understanding of this need and an appropriate regulatory response. The Commissioner states “The risk of re-identification will depend on the nature of the information asset, the de-identification techniques used and the context of the disclosure. Relevant factors to consider when determining whether an information asset has been effectively de-identified could include the cost, difficulty, practicality and likelihood of re-identification. Depending on the outcome of the risk analysis and the de-identification process, information and data custodians may need to engage an expert to undertake a statistical or scientific assessment of the information asset to ensure the risk of re-identification is low.” The difference between this formulation and that in the Commissioner’s Guidelines may appear subtle, but it is important: ‘likelihood’ of re-identification is referenced as an additional factor, together with ‘cost’, ‘difficulty’ and ‘practicality’. The relevant hurdle is also expressed as an objectively based assessment - the risk of re-identification must be reduced to be reliably assessed as “low” – as distinct from “almost no likelihood of it occurring”.

 Is this just legal sophistry or word smithing? No. Uses of personal information can be effectively quarantined or segregated such that any risk of re-identification from advertent or inadvertent matching back of outcomes of analytics activities that have been using deidentified transaction level can be objectively assessed as low. Technical infeasibility of re-identification within an organisation is probably unattainable in most situations. The Privacy Commissioner can regulate uses of personal information by requiring demonstrably reliable, repeatable and verifiable technical, operational and contractual safeguards that quarantine or segregate analytics activities using deidentified transaction level information from all possible uses of personal information. We are at the early stages of thinking about what the full range of those safeguards might be and how to objectively assess their effectiveness to guard against both advertent and inadvertent re-identification risk. But that challenge is manageable. An extension of regulation or personal information to all transactional level analysis within organisations is not manageable and would not be good privacy policy