Big Data is one of the buzzwords of the moment but it's often bandied about without a real understanding of what it is. Hailed by some as one of the greatest benefits of the internet age and by others as an attack on the heart of democracy, whether or not you see Big Data as an opportunity or a threat depends very much on your starting point.
So what is 'Big Data'? It's the term used to refer to the collection and analysis of vast, complex, often unstructured or partially structured and separate data sets using algorithms, parallel processing, pattern recognition software and other data analytic tools. Whereas statistical analysis extrapolates trends and results from relatively small, discrete data sets, Big Data, for those with access to it, offers infinitely more sophisticated analysis involving far larger sample sets and the ability to interpret hitherto inaccessible information. It shows patterns of behaviour rather than beliefs. Anyone can spin their image on Facebook or Twitter but it's harder to manipulate things like where they go and what they buy.
Both the public and private sectors stand to benefit from the opportunities presented by Big Data. Erik Brynjolfsson of MIT, estimated that organisations which base their decisions on Big Data analysis are around 5% more productive than their competitors who rely on the more traditional methods of statistics, experience and gut instinct. For businesses, Big Data offers the opportunity to spot trends in real time, allowing targeted marketing, customer friendly business offers and, considerable cost savings in areas like stock control. In the public sector, projected benefits appear almost more exciting. A study carried out by SAS and UK think tank, the Centre for Economics and Business research, suggests that the UK government could use Big Data to save £2bn in fraud detection and £3.6bn through better rationalisation and management of processes.
It's not just about the economics of government though. Big Data is also touted as the future of epidemiology after the incidence of Google search results against questions like "flu symptoms" was used to predict the spread of flu in the USA region by region. Of course, this has its flaws – in many countries particularly vulnerable to the spread of infectious diseases, access to the internet is not exactly the norm. It's hard to imagine Google searches being effective in halting the current spread of Ebola in West Africa, for example, but the possibility that Big Data can be used to forecast the spread of disease, predict unusual weather patterns, provide accurate economic forecasting and consequently help minimise the impact of threats, is clearly an exciting one.
This brings us on to the question of the use of Big Data for the prevention of terrorism and that's where the ethics of Big Data come into sharp relief. The Snowden revelations last year, which showed the extent of routine surveillance by the NSA and GCHQ, highlighted the darker side of Big Data, namely the threat to privacy. Reactions to the Snowden revelations varied considerably. While the media was outraged, the British public seems to be less bothered and many people voluntarily give up elements of their privacy to Big Data in order to reap the benefits of various apps and social media services. Others are deeply concerned about the lack of control they have over their own data and the purposes for which the data is used. One of the famous examples of the unforeseen consequences of user profiling, for example, recently highlighted in James Graham's play 'Privacy', is what happens when you search for a baseball bat on Amazon. If you do it on the .com site, the first recommended product is a baseball glove. If you do the same search on the .co.uk site, the first recommended product is a balaclava. Imagine a world in which the police turn up on the doorstep of everyone who buys a baseball bat or everyone who searches for a baseball bat and a balaclava. Well, you don't have to imagine it, that world is already with us. After the Boston bombings, a woman claimed she was profiled by the police as a potential terrorist due to the innocent searches made by members of her family for pressure cookers, backpacks and for information about the Boston bombings.
This throws up another issue with Big Data: producing truthful results is not, after all, as exact a science as lovers of algorithms would have us believe. Data analytics can certainly show us that things are related but not why they are related. This means there is a risk of discovering false truths or of manipulating results. In addition, having a larger amount of data to analyse does not mean that it is easy to find small things: you can search for a sword in a haystack more easily than you can search for a needle. It is probably fair to say that the activities of the NSA and of GCHQ have prevented many terrorist attacks but they have not prevented all of them. Would they be able to if the agencies had more data or more sophisticated ways of analysing it and would we be prepared to give them that data? Current thinking suggests not but a serious terrorist attack on Western soil could change that.
The next few years are likely to see a power struggle between the desire to exploit Big Data and the checks and balances used to control it. The legal framework will have to adapt. In fact, the European Commission has said as much in its recently published Communication on a data-driven economy, which pins its hopes for regulation on the proposed data protection reforms. It is, however, individuals who may hold the key to making the use of Big Data less intrusive. Everyone has a different line over which they will not cross in giving away personal information. People may start to pay to protect their privacy and technology may then be forced to adapt to protect individuals. Techniques like homomorphic encryption which allows operations on data to be carried out while it is encrypted and the use of differential privacy algorithms are being developed in an attempt to cut out the downsides of Big Data. Companies are already responding to customers' concerns about privacy. Apple's iOS8 operating system will reportedly generate a random rather than device specific MAC address which will come as a blow to marketing analytics firms.
The statistics on the amount of useable data being generated are truly mind-blowing: around 2.5 Exabytes (an Exabyte is 1bn Gigabytes) per day are being added and that amount is only continuing to grow. Another often cited problem with Big Data are the big resources required to exploit this ever larger pool of data. The complexity and expense around the staff and equipment required to make the most of Big Data favour the tech giants although, having said that, a number of applications used to analyse Big Data (for example, Hadoop) are open source. The greater the thirst for data, the more sophisticated the technology behind interpreting it must become. This potentially creates a virtuous circle for innovation and a huge job market for people trained to analyse and interpret the results churned out. While the fact that the tech giants are currently better able to engage with this is a threat to competition, there is also tremendous opportunity for those who are first to jump on the Big Data bandwagon. This has not been lost on the UK government which recently announced funding of £48m to set up the Alan Turing Institute for Big Data and algorithm research in the hope of making the UK a world leader in the field. Whatever the issues around Big Data, there can be no doubt that Big Data is big business.