I hate buzzwords. Glad I got that off my chest. You see, in the somewhat nebulous space of information governance, the last thing we need is more marketing clutter to make our ideas and concepts more ambiguous and harder to explain. Unfortunately, it would seem that everywhere you turn there’s a new name or title given to a concept, which does more to complicate than simplify the conversation. For example, “what are you doing about your big data?” can elicit many different answers depending on the audience and their understanding. And don’t get me started on “REAL Single Instance Storage” or “end-to-end eDiscovery.”
Recently, however, I’ve been uncharacteristically in tune to the conversation around a new buzzword because I think this discussion is pivotal in the general trend towards information governance – data lakes. Don’t get me wrong, I still dislike the actual term: why is everyone so obsessed with bodies of water? Wouldn’t it actually be more of an ocean? Perhaps more of a dam. I digress.
Data lakes, when implemented effectively, can be conceptualized as a management layer for all of the content generated within the enterprise. It is the name given to the platform necessary for information governance. The transaction -- or production -- layer is where data is born: emails, IMs, file creation, collaboration tools like SharePoint, and the like. But a data lake is necessary to capture and process all of these different data types so that they can be leveraged as a whole rather than a fragmented sum of the parts. Content creation tools alone are scattered puddles of information, whereas the data lake concept seeks to pool the contents to make holistically managed and searchable.
FINALLY, I had it. A necessary buzzword that (1) names a key piece of the information governance process, while also (2) inherently acknowledging this piece as necessary in the grand scheme.
Then, along came the backlash. I read article after article talking about how data lakes won’t work. The one that really got to me was from Gartner, the self-proclaimed leader in IT discussion.
Data lakes are the missing piece to how companies can leverage their data for analytics, business intelligence, and collective corporate “memory.” If you have the proper data lake, you can tap into the gold-mine of data. There are companies that are innovating, growing, and revolutionizing their industries – Uber, Netflix, and Amazon to name a few. The reason: In order for a business to succeed in the 21st-century, they need to analyze and respond to their data. Successful companies have aligned themselves so that they can respond quickly, and I anticipate in the next 5 years we’ll see more, similar transitions from the large, established Fortune 500 companies, and not just the up-and-comers. The right data lake is necessary for this transition.
My issue with the Gartner article is that it misdiagnoses a data lake as a dumb vault where all information is stored in a seemingly disorganized fashion. Obviously, I agree with them that this is not the right solution for big companies. However, they’re ignoring the fact that data lakes with sophisticated management capabilities are available now. With full-text indexing, automatic and hybrid records classification, role-based access, and hierarchical policy structures, companies are able to know exactly what information they have so that they can extract any and all necessary data.
This ability to notice trends and react to them is becoming a pivotal part in successful corporation’s business models. They analyze what their consumers are discussing on social media, how their employees are reacting to new product launches, and legacy communications that include pivotal business information. The companies that choose to ignore the value that exists in their data are those that are poised to fail.
The only way a company can truly leverage the enterprise data they have is to capture all of that data. Once everything is harnessed, decisions for management can be made accordingly based on the company’s needs and desires. Want to store everything forever? Sure, you can do that. Prefer to eliminate the “junk” data that has little business use? You can do that too. Once captured, the business can delete everything that is deemed non-business (think baby pictures, MP3s, grandma’s recipe, etc.) and analyze the relevant content that is left over. But the common need is a data lake – or management layer – in order to make those decisions in the first place.
You know the saying, “you can’t teach an old dog new tricks”? Perhaps it’s not true. The enterprise is a retriever, and in order to fetch any new insights, he’ll need a (data) lake to go jump in.