Information Governance

The Future is Dark…Data

80% of all data is completely unused

The future is dark data

A couple of days ago – or at least a couple of days from when I started writing this – CRN did an article on Ed Harbour, the illustrious VP for IBM’s Watson Group, where they talked cognitive computing and this conspicuous little thing called dark data future.

Now, usually when the terms “dark” and anything revolving around something digital come around, some not-so-nice things pop into people’s heads, mainly due to horror stories about the “dark” or “deep” web, as the unindexed part of the Internet has become known. But for all of its potentially nasty associations, “dark data” is surprisingly benign -- well, at least at first glance. defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing)” and blah, blah,’s basically unused or unprocessed and usually unstructured data that sits in storage. And while that might sound not so bad, it actually gets a little nefarious when you really start looking into it.

The thing is an estimated 80% of stored data is dark data -- shoutout to our good friend, Vilfredo Pareto -- and that starts getting really bad really fast when you consider how much new data is created each day and how many tera- or petabytes of data exist in, say, your average Fortune 500. Or just your average business, even.

And even if you aren’t the CEO of a startup or multi-billion dollar corporation and are more like a casual reader who stumbled across this article, this should still be bad news to you. Why? Well, because your personal information exists somewhere in this black sea of unused bits and bytes and if you’re going to become the victim of fraud or identity theft or some cybercrime in the future, there’s a good chance it’s going to be due to the misplacement of this so-called dark data.

So, let’s have a look at the numbers. The New York Times says that 90% of energy that’s used by data centers is waste, pure and simple, and IBM says that 60% of data loses it’s value almost immediately. And amongst organizations in Europe, the Middle East, and Africa, it’s estimated that the cost of all this is upwards near a trillion dollars -- and that’s not even considering the costs that might come from something like a legal suit because of what I just said a paragraph ago...

…not good, I know.

So why does dark data even exist? Why even have it in the first place? Well, a lot of it comes from people’s nature. We have a tendency to collect things. The same way that you’ve had that treadmill in your basement for the last 5 years because, “hey, I might use it some day” is the same way that a lot of organizations approach data storage, unfortunately.

The other reason is that not all dark data is inherently bad. In fact, it’s believed that dark data will be all the rage in the near future, especially with the development of cognitive computing a la IBM’s Watson. Of course, records management solutions also already exist to isolate and get rid of redundant and useless dark data.

The point, really, is the future is dark. And whether that’s a good or a bad thing, remains to be seen.

As a content and events specialist at ZL, I work to bring the glamorous allure of information governance to the world. As a native Virginian and temporary Tennessean turned Californian, I’m permanently fascinated by life on the west coast. Although I miss SEC football and four distinct seasons, I’m in love with redwood forests and bubble tea on every corner.