Before any big data analytics exercise, Agencies need a thorough understanding of the inputs, outputs, and their processes to stay within the law.
The digital economy has seen an exponential increase in the production of data, not least in government-collected data about citizens and businesses, organisations’ internal operations and its own interactions with external parties such as suppliers and communities.
The benefits of big data analytics are no secret; government recognises that their use can improve decision‑making, targeting and delivery of services, and thus productivity, which, in turn, can substantially reduce government administrative costs. The challenge lies not in convincing agencies to use big data but in actually analysing it effectively and lawfully.
With the introduction of the Big Data Strategy and Better Practice Guide for Big Data, the Australian Government is looking to better use it to improve the way it delivers services and develops policy.
What is "big data"?
It is estimated that data is now being generated in excess of 2.5 quintillion bytes per day. That is accompanied by a surge in data sets so large that they defeat traditional software and management – the so-called "big data".
The factors that make big data challenging technically are its volume, velocity (the real time way in which most of the information is captured by a system) and the variety of disparate data sets that can be accessed.
Improvements in computing technologies, including analytics tools, storage and processing capacity, however, are now enabling big data to be analysed close to real-time.
The Australian Government's direction for big data
The Australian Government (via AGIMO) released "The Australian Public Service Big Data Strategy" in August 2013, which is "intended for Australian Government agency senior executives with responsibility for delivering services and developing policy." It sets out six principles to guide agencies in their approach to big data:
- Data is a national asset
- Privacy by design
- Data integrity and the transparency of processes
- Skills, resources and capabilities will be shared
- Collaboration with industry and academia
- Enhancing open data.
This was supplemented with the Better Practice Guide for Big Data in April 2014, which gives guidance on establishing a business requirement for a big data capability, implementation, information management and big data project management.
Establishing a business requirement for big data is premised on the standard considerations of cost and return on investment, but also on the Agency's current and future:
- strategic objectives
- business model
- data availability
- (maturity of) technology and capability
- availability of skilled personnel to manage data acquisition and analysis.
The Agency will also need to assess the likelihood of accruing benefits during the development of the capability.
Big data infrastructure challenges for government agencies
In traditional data analysis, structured sets of data were analysed often using Structured Query Language (SQL). While SQL may still be used for particular purposes, a feature of big data analytics is that all of the data, including structured, unstructured and messy data, is analysed in real time.
An Agency contemplating big data analytics needs to ensure scalability of their infrastructure to ensure the infrastructure is optimised for very fast capture and retrieval, which means understanding the likely size of the data it will capture and store.
Privacy aspects of big data
Big data is no different from any other form of data; if there is any "personal information" then the Privacy Act 1988 (Cth) rules on personal information will apply to its collection, use and disclosure.
The Act's definition of "personal information" requires the information or opinion to be tied back to an individual who is identified or reasonably identifiable. The problem this poses for big data use or analytics is that even if the information starts out depersonalised, big data analytics might bring together the information and the individual again, or even create new personal information by bringing together data sets.
Thus, Agencies need to understand what data is being collected and whether it will be considered personal information (or could become personal information if individuals are re‑identified as a result of the big data analytics exercise). This requires reconsidering:
- their data collection to minimise the collection of personal information (where possible);
- how they intend to use information at the time it is collected to ensure they obtain the appropriate consents for its future use. It does not matter that there is no personal information at the time it is collected, given personal information may be created as a result of the big data analytics exercise;
- maintaining accurate records of the (scope of) consents provided by individuals in relation to their (personal) information;
- seeking (revised) consents for the proposed use of their personal information; and
- removing old unnecessary data to minimise unnecessary privacy breach risk.
Keeping big data secure
Both the use of big data analytics and its outputs will need to be kept secure and comply strictly with the requirements of the Protective Security Policy Framework and the Information Security Manual to ensure public trust in the Australian Government and its systems.
Collaboration between Agencies
While each Agency may want to undertake big data analytics, there are benefits from Agency alignment.
From a useability perspective, Agencies might consider implementing a consistent approach to formats, metadata and standardised application programming interfaces (APIs) to maximise the opportunities and benefits available under a big data analytics exercise across Australia.
Risk management will also be a fundamental part of the use and analysis of big data, particularly under the new Public Governance, Performance and Accountability Act 2013(Cth) with its increased focus on risk management. The development of Memoranda of Understanding between Agencies may facilitate information sharing, before and after doing big data analytics.
Who owns big data?
Finance has indicated that the Australian Government intends to own the intellectual property rights (IPR) in new databases developed as part of its big data analytics, but that it will recognise and attribute "respondents".
Finance has however not elaborated on what IPR it expects will arise as a result of a big data analytics exercise. It is questionable whether copyright protection will be available for big data databases, as the data will be developed by machines rather than a reflection of human creativity; the software code used to generate a big data database would, however, be covered by copyright.
It will be challenging for industry to engage in any big data joint private partnership currently envisaged by Finance on this basis, as the value of big data databases directly derives from access to the underlying data (in which the intellectual property rights may already be owned) and the algorithmic process of selecting and manipulating the data.
In developing big data IPR strategies, Agencies will need to carefully consider existing IPR rights, what instruments they will require to undertake big data analytics exercises, and the ownership (or licensing) of any resulting big data database.
Effective use of big data has the capacity to significantly improve Government service delivery, operations and policy development, but there are risks associated with big data analytics that will require careful consideration at all stages of any big data analytics exercise.
Advances in technologies, including cloud computing, will make big data analytics more technologically accessible for Agencies, but may also increase the associated risks of breach of confidentiality, privacy and security.
Before engaging in any big data analytics exercise, Agencies will need to ensure they have a thorough understanding of the nature of the inputs, the process required to develop the outputs and the potential scope of the outputs to ensure they effectively manage potential risks and limitations associated with big data analytics. This will also require regular revisions of big data analytics projects to ensure risks are appropriately managed.