Enterprises are increasingly required to have control over their unstructured data, as they need to be able to produce relevant documentation for legal discovery, store information to prove regulatory compliance, and remediate sensitive information per privacy laws. In this regard, information governance was created as a means to abate risks; however, enterprises are able to extract benefits from this control as well. Notably, analytics into unstructured data can reveal insights into your organizational operations that increase the bottom-line. Yet, despite the requirements and benefits, over 80% of enterprise data is still considered dark and inaccessible.
While small, non-regulated companies can get by without intentional information management, the majority of enterprise-class organizations need governance capabilities. The question boils down to how organizations approach establishing control. Traditionally, there have been two main means of managing unstructured data: creating data copies in an archive or working with original documents in-place.
As we dive into the differentiators between archive and in-place systems , think of information governance like taming animals (data sources), you can do so in a controlled zoo (archive) or in their natural habitats (in-place).
Data in a Zoo: Archive Governance
Just as all zoo animals live in a controlled environment free from natural dangers, so too are data sources housed in an archive. While training wild animals is never simple, data sources in an archive are easier to control. Preserving data, an archive stores copies of original documents in a secondary location where they will remain immutable until deleted. From a governance perspective, this consistency is incredibly helpful as it allows for rules and policies to be placed upon their ingestion without any risk that future edits would alter their categorization. Furthermore, an archive ensures documents are accessible , just as zoos have maps showing visitors where to find animals, users can reliably search an archive as the data cannot move freely about. These factors combined make managing information—from a technical perspective—a more straightforward task in an archive than in-place.
That said, archives are not without their challenges, as evident by how few organizations choose to archive their information unless they are in highly regulated industries, such as financial services, pharmaceuticals, healthcare, manufacturing, and government sectors . The main difficulties with archiving as the sole governance strategy are logistic complications, scope of view, and costs. Most regulations only require a portion of all enterprise data to be copied into the archive, leaving organizations to differentiate and isolate what needs to be copied and what can be left in-place. Further, having two copies can be cumbersome as users have to keep track of edits made on the in-place documents to ensure the archive has original and up-to-date copies. To that end, zoos and archives do not represent a full picture of the earth’s ecosystems; they are incredibly biased towards the dangerous and the unique. For legal and privacy compliance, this is rarely—if ever—an issue. However, for analytics, this skews the viewpoint from which your organization is being assessed. For example, if your organization were to archive emails but not Microsoft Teams messages, any attempt at relational analytics from archived datasets would miss countless communications and connections. Lastly, archiving dramatically increases IT overhead costs, as creating data copies inherently increases the amount of storage required to house enterprise data and the amount of labor required to maintain them.
Data in its Natural Habitat: In-Place Management
In contrast, you can think of in-place data management like taming animals in their natural habitats. In-place governance keeps data sources in their original storage locations, siloed by ecosystem: files in file shares, emails in email servers, etc. The core benefit of this approach strongly parallels the weaknesses of archives, in that there is no need for creating data copies in-place. That alone dramatically reduces governance costs, as it removes the need for a secondary storage location. Further, because all enterprise data is housed together and updated in real-time as employees continue to create, edit, and delete documents, in-place management serves as an accurate and up-to-date window into an organization. If done correctly, in-place governance can produce the same results as an archive for legal, privacy, and analytics use cases—at a fraction of the cost. However, due to the everchanging nature of in-place management, it will never be as defensible as an archive.
The reason many organizations do not opt for in-place governance is twofold: (I) technology providers historically were not able to manage an everchanging data ecosystem and (II) highly regulated industries require an archive for compliance needs. Expanding on the former, it is incredibly difficult to tame wild animals and manage living data sources in-place. The largest obstacle to overcome is figuring out a means to track, categorize, and access documents while users continue to work on, move, delete, and create new ones. To have an accurate view, organizations would have to regularly crawl the enterprise to map out where documents are in near real-time. In terms of compliance requirements, the majority of regulations require that enterprise data is not only managed, but preserved to be immutable. Accordingly, most require data to be copied into an archive. That said, for organizations who are not restricted by regulations, in-place efforts can produce fantastic results across the spectrum of use cases and are far more accessible to the average enterprise.
Data Governance: Managing Both the Zoo and Nature
The reality of the situation is that for most large organizations the choice is not a binary archive vs. in-place, rather to what extent should they be blended together. To that end, the future of information governance will be creating ways to marry the two together, to bring joint functionality despite their differences.