The concept of "Big Data" is a rather nebulous one. It has been defined by the Gartner Group as "high-volume, "high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation."
However one defines "Big Data", the commercial and legal implications relating to it and its commercialisation are becoming increasingly important.
90% of the data in the world today has been created in the last two years alone.
IBM has estimated that: "Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data."
It is not surprising; therefore, that Big Data has recently attracted the attention of relevant agencies and regulators in Australia. Of particular significance are a recent report by the Bureau of Communications Research, (an independent Commonwealth Government research unit) an Issues Paper released by the Productivity Commission and a set of draft guidelines issued by the Office of the Australian Information Commissioner.
Bureau report on Big Data
The enormous potential economic value of Big Data to national economies has been the subject of a number of recent reports. In terms of the Australian economy, the Bureau estimated that open government data could generate up to $25 billion per year for the Australian economy (or 1.5% of Australia's GDP).
The McKinsey Global Institute study Open Data: Unlocking Innovation and Performance with Liquid Information estimated that $4 trillion in annual economic potential could be unlocked globally through increased efficiency, development of new products and services, and consumer service. Recent UK estimates put the overall economic value of open government data in the UK at $3.7 billion and, when a measure of societal value is included, at between $12.9 billion and $14 billion. The potential value in the US has been estimated at $1.5 trillion and in 2008, the use and re-use of spatial information was estimated to add $1.1 billion in productivity-related benefits to New Zealand's economy. The potential impact of releasing open data in Canada was estimated in 2014 to be in the vicinity of $134 billion.
It has been estimated that the annual economic potential of globally unlocking Big Data could be in the vicinity of $4 trillion per annum.
The Bureau, broadly speaking, saw the direct benefits of unlocking government data as revolving around the development of new and customised products and services for business, government and for the community, as well as enhanced job creation and improved tax revenues.
Indirect benefits were considered to include "more engaged and empowered citizens and improved government services". The Bureau cited as an example the US Government's Blue Button initiative, which provides millions of Americans with online access to their own health information. The benefits of doing so are plain, once the previous paperwork-dominated process is considered, under which individuals had to contact multiple parties including health insurance companies, hospitals, clinics, pharmacies and/or laboratories, in order to access their health information.
The Bureau of Communications Research has identified direct benefits of unlocking government data as including the development of new and customised products, and indirect benefits in the form of streamlining government services.
Bureau Report philosophy
The philosophy underlying the Bureau's report is that information, which can be costly to acquire, drives innovation and informed choice, which in turn produces benefits in terms of improved productivity and efficiency. According to the report, economic theory suggests that government "has a responsibility" to provide information and data that bring together broader societal benefits, such as empowering and engaging citizens and communities, improving the efficiency of markets and encouraging competition and innovation.
Bureau Report data segments
The Bureau's report segments data into three categories – raw or basic data, incremental or value-added data, and commercial data.
Issues involving Big Data must be considered in different categories – namely, raw data, value-added data, and commercial data.
Raw Data collected in the course of the government's usual operations or business, regardless of whether the data can be re-used by another party. Meteorological data is cited in the report as an example of this category of data. According to the report, from an economic perspective, raw data should be priced at zero, or at the most, at short-run marginal cost. Data in this category produces greater benefits to the community as a result of being made freely available, rather than being subjected to a pricing formula.
Incremental or value-added data
Incremental or value-added data comprises raw data with a value-added component, whether or not the value-add produces any private sector appeal. According to the report, value-added data should be made available on a cost-recovery basis and such data has the potential to enhance economic efficiency by ensuring that users recognise the costs associated with value-added production. The charting of mineral deposits on a map is cited in the report as an example of value-added data.
Commercial data could be generated by the public or private sectors, should, according to the report, be priced pursuant to competitive neutrality principles. Examples cited in the report include ASIC company documents, Bureau of Meteorology temperature observations, and student record outcomes data generated by the National Centre for Vocational Education Research.
Importantly, despite the obvious economic benefits, the report highlights a number of associated legal, security and privacy issues. These issues include licensing conditions (to act as a mechanism to balance access to government data and to protect intellectual property rights), legislative requirements, accessibility support and the sensitivity of the data (eg, whether it contains information about national security). These reservations are all factors thought to potentially limit the extent to which Government data could be made available. The report acknowledges that the public sector faces challenges in devoting funds to making data available despite the wider economic benefits of doing so.
The direct and indirect economic benefits of Big Data exploitation must be weighed against legal, security and privacy issues.
A useful example of the significance and value of data to Australian industry is discussed in the May 2016 Report of the House of Representatives Standing Committee on Agriculture and Industry, which concerned an inquiry into innovation within the agricultural sector. This report, which considered emerging technologies in the areas of biological science, materials science, seasonal forecasting and digital science, notes that the "rapid growth of information and communications technology in recent decades is expected to drive new directions for agriculture, in areas such as automation, and developments in infrastructure and platforms that will allow farmers to store, access, re-use and market their own data". The report states that a range of emerging technologies, such as those enabling improved data collection from sensors within the Internet of Things ecosystem, as well as improvements in data storage and management, could be used to convert and transform data into "information, projections and suggested actions for individuals and the sector" and to thereby improve productivity. The CSIRO, for example, filed submissions outlining that "farm-scale data" could be "fused with broad scale national and regional data streams covering issues such as climate, soils, water and biodiversity".
Productivity Commission Issues Paper
In April 2016, the Productivity Commission released an issues paper entitled Data Availability and Use. The issues paper was released in response to terms of reference from the Treasurer requiring the Commission to conduct a broad-ranging investigation into the benefits and costs of options for improving availability and use of data.
The issues paper clarified the use of common definitions. It distinguished between "data" and "information", stating that "data" comprises raw, unorganised material such as characters, text, words, numbers, pictures, sound and video, whilst "information" is something derived from data after it had been processed and presented in context.
The paper stated that "Big Data" was characterised by the "three Vs":
- High volume refers to the sheer volume of data being collected;
- High velocity refers to the great speed at which data is being generated, often in the near-real time, and how it can be readily accessed, processed and analysed;
- High variety refers to the many different formats of data and its diverse sources.
The Issues Paper defines "open data" as data that anyone can access, use or share, subject only, at most, to the requirement to attribute and share-alike.
"Metadata" is defined in the Issues Paper defined as being "data about data", making it easier to retrieve and use digital resources such as computer files, webpages and databases.
The Issues Paper states that "data is increasingly integral to how economies function" yet much of the data generated remained under-utilised. The paper stated that "while governments must be mindful of the legitimate privacy concerns of individuals, and how the 'digital universe' is enabling detailed profiles of individuals to be built and used, efficient data management requires more than just privacy standards".
The paper describes data as a "key economic resource", principally because it could be shared, used and re-used an unlimited number of times.
The paper also touches upon the vexed question of "data ownership", noting that while data could be subject to copyright and other intellectual property rights, the concept of ownership of data was sometimes not quite as straightforward. It identified complications arising out of the use of personal devices such as smart phones and the question of who owns the data when the personal device communicates with a wider network. There were also questions around what constitutes an individual's "consent" for an organisation to collect, use and share data about them.
Private and public sector benefits
The report identified a range of potential benefits across both the private and public sectors, which could be derived from increasing the availability and use of data:
- Efficiency by replacing traditional and intuitive approaches with data-driven processes (for example, to better understand consumer preferences or the effectiveness of public programs), data can either lower the costs that businesses and governments incur in providing goods and services or allow them to better target their products to consumers.
- Empowerment of consumers with increased access to the data created through everyday transactions can empower consumers to make decisions based on what best suits their situation; similarly, provision of data on the relative offerings and performance of product and service providers can help consumers to assess what is available.
- Competition — wider availability of data can create market opportunities for new businesses, or enable existing businesses to expand into new areas, thus fostering more competitive markets.
- Innovation — data can help to break down information gaps within and across parts of the economy, potentially providing the building blocks for new products and processes.
- Accountability of governments — public sector data can shed light on the effectiveness of existing and past government interventions, improve the design of future policies and programs, enable community scrutiny of the evidence base (such as government-funded research) used to support policy interventions, and generally sharpen incentives for governments to perform well.
Privacy Commissioner's Draft Guide
The Office of the Australian Information Commissioner (OAIC) has produced a draft Guide to Big Data and the Australian Privacy Principles. The draft guide is open for public comment until 25 July 2016.
The guide emphasises that privacy issues can be avoided to the extent that data can be de-identified and not classifiable as "personal information" within the meaning of the Privacy Act 1988 (Cth). According to the draft guide, Privacy Impact Assessments should be used whenever an entity is developing or reviewing a project which used or will use Big Data. The draft guide emphasises that entities undertaking Big Data activities should implement the four steps outlined in the Privacy Management Framework, namely:
- Embed a culture of privacy that enables compliance;
- Establish robust and effective privacy practices, procedures and systems;
- Evaluate your privacy practices, procedures and systems to ensure continued effectiveness; and
- Enhance your response to privacy issues.
The OAIC has released a draft guide to Big Data and the Australian Privacy Principles which emphasises that entities undertaking Big Data activities should follow the Privacy Management Framework.
The draft guide identifies numerous Australian Privacy Principles (APPs) of potential relevance to any policy or initiative involving the commercialisation of Big Data.
Exploiting Big Data – potential impacts
Entities involved in exploiting Big Data need to bear in mind the potential impact of the Australian Privacy Principles, specifically APPs 1, 3, 5, 6, 7, 8, 10 and 11of particular relevance are the following:
- APP3, which prohibits an entity from collecting personal information unless the collection is necessary for one or more of the entity's functions. The draft guide emphasises that, in the context of Big Data, entities should consider what personal information is reasonably necessary and for what purpose, adding that entities using "all data" for "unknown purposes" will expose themselves to privacy compliance risks. The guide also stresses the risk that personal information used in Big Data activities is likely to include information collected from third parties, meaning that consideration should be given to the third party's privacy. Where sensitive information is collected, the consent of the individual will generally be required.
- APP5, requires entities to notify individuals when their personal information has been collected, including how it was collected, why it was collected and possible overseas destinations to which it might be sent. The guide emphasises that seeking consent at a later time for a secondary use of personal information for Big Data activities, where practicable, can be "costly and difficult". The Privacy Commissioner recommends the use of a Privacy Impact Assessment to consider how personal information will be used in Big Data activities and to inform how and when a privacy notice that is meaningful to individuals can and should be given.
- APP6, which provides that information may only be used for the primary purpose of collection or (unless sensitive information) a reasonably related secondary purpose. The draft guide warns that where health or personal information is being handled for Big Data activities, it may be impracticable to obtain individuals' consent. The guide also urges that entities undertaking health or medical research should ensure they are familiar with the Section 95 and Section 95A Guidelines issued by the National Health and Medical Research Council.
- APP7, which places constraints on the use of personal information for direct marketing and which, in certain circumstances, requires the display of an opt-out notice. The draft guide emphasises, in the context of Big Data, that organisations which facilitate other organisations' direct marketing (such as data list brokers) also have specific obligations under APP7, including an obligation not to use or disclose individuals' personal information where an individual has asked them to stop. It also observes that one of the key purposes of Big Data analytics is to assist organisations to improve their marketing strategies and, if this involves targeting individuals, the requirements of APP7 need to be borne in mind. In this context, organisations should keep track of the type of information they are collecting, consider individuals' expectations as to how their information will be used, and consider how to implement simple and effective ways in which individuals can opt out.
- APP8, which stipulates that personal information cannot be sent overseas unless it is reasonably expected that the overseas recipient will comply with the APPs. Even so, the Australian entity will remain liable unless the individual has consented to the overseas transmission or unless the recipient is located in a jurisdiction with similar privacy laws. The draft guide observes that Big Data activities often involve using overseas cloud or internet based platforms. It suggests entities should undertake due diligence before disclosing personal information to overseas recipients in order to identify risks potentially arising under APP8.
- APP10, which requires personal information to be kept up to date, accurate and complete. The guide observes that the nature of Big Data analytics means that there is a higher risk that the personal information used or collected through analytics may not be accurate, complete or up to date because of the large amounts of data involved. In these circumstances, more rigorous steps are likely to be required to ensure the quality of personal information.
- APP11, which requires an entity to take reasonable steps to avoid interference to, loss of, or unauthorised access of personal information in its possession. The draft guide urges entities to use a Privacy Impact Assessment to assess what personal information they need and for what purposes, and also to consider de-identifying personal information so that it can be kept for future use. An information security risk assessment would also be beneficial, specifically focusing on steps which can be taken to protect any personal information held in connection with Big Data activities.