Next evolution, Humongous Data?
With massive amounts of our personal data now being routinely entered, collected, stored and exchanged, data security and privacy breaches are almost inevitable, in particular the large-scale attacks that lead to the theft of millions of individuals’ data are becoming more and more common nowadays.
With technology at our fingertips, we are sharing more and more information online and by electronic means. From sensors that fit into our cars to wearables, from cloud computing to social networking interaction, from digital pictures and videos to cell phone GPS signals, from online purchase transactions to a sign up process, from the telecommunications’ and insurance to medical or banking sectors, we leave traces of information with every move we make.
The massive volume of data generated and gathered is popularly referred to as ‘Big Data’. The concept commonly describes such a large amount of complex, unstructured, diverse and fast information that it is difficult to process using traditional database and software techniques. Billions to trillions of records of millions of people are now measured in new units as petabytes and exabytes. The golden era for gigabytes is long gone.
So what is so special about Big Data?
The analysis that can be done with Big Data enables the establishment of correlations among large populations that is useful to individuals. It creates a remarkable opportunity for the worldwide society in any field you can think about, ranging from criminal rate predictions to medical research, from public health to national security and from marketing to risk analysis. Companies and governments no longer have to rely on sampling: they have access to the entire plentiful digitized knowledge of digital age, a myriad of data points collected for unrelated purposes and updated in real time.
For instance, a few years ago, Google was able to predict flu outbreaks faster than what was possible using hospital admission records, just by analyzing clusters of search terms by region in the United States. All with algorithms! Quite impressive, huh?
In our enthusiasm to share and bond with others, to live up to the facilities allowed by new technologies, as the world grows more and more connected, we are quite easy when it comes to give away information about ourselves. Businesses know that. And they are continuously developing new means to collect information about their customers.
Why wouldn’t they?
They can try to look for hidden patterns, trends or other insights that will enable them to better mould their products and services to customers, anticipate demand or improve performance. Big Data certainly can bring the appropriate knowledge that will allow innovative improvements for businesses… from which all of us will ultimately benefit. As a result, personal data is consistently collected and traded, being the new money in the new economy that is internet.
For instance, have you noticed how frequently it happens that, after having searched a certain type of good or of services on Google, you will have matching publicity, on the right side of your ‘gmail’ window tab next time you open it?
But the astonishing advantages coming from the analysis of Big Data are tempered by concerns over privacy and data protection.
I believe that many of us don’t think much about the implications of easily sharing and giving away personal details online nowadays. After all, how many of us actually read the consent form regarding the use of our personal data?
But it is important to reflect on a few points which I assume won’t let anybody comfortable after consideration.
Consider, for instance, that some retailers are able, through the analysis of purchasing habits, to predict such intimate details as the pregnancy of a customer and that, despite the will of the concerned customer, ensuing marketing activities which result in disclosing that information.
Consider, for example, with such a volume of data and powerful analytical mechanisms, the combination of data might lead to the identification of individuals, despite the anonymisation of certain elements.
Consider, now, that the data contain biases, inaccuracies, obsolete and missing information, flaw correlations, that unavoidably affect the predictions and conclusions resulting from its analysis and that decisions that can affect your welfare will still be taken based on those predictions and conclusions.
Consider also that most of the data being collected about us more and more doesn’t come directly from us.
At last, consider that hospital records of national health system patients could be sold for insurance purposes.
Scary, at the very least…
The good or bad news is that Big Data analysis isn’t as efficient as many would like or fear it to be.
The risk of biases inherent to data and false correlations and associations is great and increases as bigger volumes of data are analyzed.
For instance, Google’s model of predicting the spread of flu ended up pointing to an overestimated the phenomenon by almost a factor of two.
Regarding public security, Big Data hasn’t proven itself either able to detect patterns or anomalies that could help prevent acts of terror.
No so reliable after all…
Neverthless, one cannot escape Big Data. We live so entangled in it that is more and more usual to talk about an ‘internet of things’. Good things can come from it. But nobody can be entirely sure that it will be used for the legitimate purposes.
In parallel to the enthusiasm of connecting and sharing, there is an increasing concern surrounding the lack of privacy.
The advantages which result from Big Data analysis will only be reached if privacy expectations of users are appropriately met and their data protection rights are respected. However, finding the right balance between all the interests at stake: those of the individuals concerned, those of businesses and, ultimately, the general public interest might not be an easy end to chieve, namely in the field of health research.
The Article 29 Working Party recently issued a statement on the impact of the development of Big Data on the protection of individuals with regard to the processing of their personal data in the EU, where it found “no reason to believe that the EU data protection principles are no longer valid and appropriate for the development of Big Data.” Nevertheless, it envisaged the possibility of “further improvements to make [the principles] more effective in practice” in the context of Big Data.
In my opinion, data protection principles shall be deemed to be applicable, as they refer to fairness, transparency and, ultimately, trust. For that reason, the ‘notice and consent’ and the ‘purpose limitation’ models should be preserved as much as possible and data ought to be anonymized to the point where re-identification is secluded.
This week, the European Commission and Big Data Value Association, an industry-led organisation which acts on behalf of companies including ATOS, Nokia Solutions and Networks, Orange, SAP, SIEMENS, have committed in a public-private partnership (PPP) that aims to support research and innovation in Big Data technologies and infrastructures to ensure privacy and security.
No statistics can predict what uncertainties do the future holds regarding Bid Data… However, in these high-speed changing times of information and communications technology, we will surely know anytime soon…