We often think of the 21st century in terms of the vast degree of innovation and interconnectivity that has emerged, but we rarely consider the explosive growth of the very data that gave the information age its name.
The past two decades have seen an amazing increase in the amount of data produced by the human race, and that growth continues to exponentially accelerate year after year. Most people have yet to comprehend the sheer scale of this phenomenon and fewer still have considered the implications that this information tsunami will have on the world at large. This brief explanation will help to put the growth of data in a workable context, explain how the private is already grappling with some of these growth related challenges, and outline how other organizations might be forced to adopt their tactics in the decades to come.
Data Growth by the Numbers
The amount of information being produced on even a daily basis is staggering. As of 2015, the world produced, on average, 2.5 exabytes of data per day. This is roughly equivalent to 2.5 quintillion bytes of data or the content of 4 trillion books (in practical terms more than all of the books printed in the history of humanity). In 2007, it was hypothesized that the entire sum of human knowledge to that point amounted to 295 exabytes while in 2015 alone almost 913 exabytes of data were produced. The numbers are not yet available for 2016, but given that the data produced in 2014-15 amounted for 90% of all of the information ever produced by humanity, chances are solid that the numbers will be even higher for 2016.
Looming on the horizon is the oft discussed Internet of Things (IoT), a general term for traditionally non-digital devices such as fridges and lighting systems which will capture and share information in an ever expanding network. Current projections estimate that over 24 billion devices will contribute to the IoT by 2020. This will likely provide the bulk of the force behind the next wave of data creation acceleration, and that acceleration is predicted to be unimaginably large.
The current estimates for the total amount of data produced by humanity ranges from 2-5 zettabytes (1024 exabytes). By 2020, that sum is projected to stand at 35-40 zettabytes. To put that in perspective that means that all of the data produced by the human race up through 2015 will amount to a mere 5% of the data in existence by 2020.
The vast majority of discussion on the topic of this explosion of information has centered on either the immense promise offered in terms of economic and societal efficiencies or the ethical/privacy concerns related to collecting ever-increasing amounts of people’s lives. This overlooks a far more fundamental issue however. Before we can reasonably discuss the benefits or ethical challenges of data collection on such a massive scale, it’s important to ask whether or not that degree of collection would even be feasible (or useful) to begin with.
Obviously not all produced information would necessarily be stored long term, but even a small percentage of the cited 40 zettabyte sum would dwarf any current storage capabilities available. More relevantly, even if the data could all be stored, the resulting morass of information would be exceedingly difficult to properly process, organize or utilize. The NSA infamously analyzes roughly 51 petabytes (for reference, there are one thousand petabytes in an Exabyte and one million petabytes in a zettabyte) of communication metadata DAILY, and still faces substantial challenges identifying the relevant elements from the useless noise. While the opportunity to have access to even more data through the IoT would likely be attractive from a defense perspective, the sheer volume of information from IoT devices will likely render most of that additional information functionally useless.
On a smaller scale, companies all over the world are coming face-to-face with this challenge as they struggle to properly manage their email and file systems. As company content and data are produced at ever-increasing rates, organizations are finding that their issues lie less with the cost of storage, and increasingly in the value of storage. What good is having petabytes of files and communication records cheaply stored if they are unable to find or keep track of the data in question? Society at large will be faced with similar questions in the years to come as the ever rising ocean of data climbs ever higher ever faster.