The average enterprise has hundreds of terabytes of data locked away in their on-premises and cloud servers. Data storage is only set to increase as the pandemic has organizations increasingly reliant on digital communications to collaborate.
While storing troves of data may seem innocuous, the issue lies in not knowing the risk and value—trash and treasure—being stored within. This series will explore the importance of file analysis as organizations harness their data to protect themselves from potential harm and leverage that control for strategic insight.
The majority of enterprise files can be cleaned up
Most enterprise data is closer to trash than treasure, with over fifty percent of the average organization’s data being considered redundant, obsolete, or trivial (ROT).
Redundant data is unnecessary file copies stored throughout an organization’s servers. Ideally, organizations only want a single data copy, as it ensures all policies applied and edits made on the document are universal.
Obsolete documents are any unused files, typically they are unutilized because they are incorrect, incomplete, or outdated. Often obsolete documents remain in circulation due to a lack of governance policies established when they were created, resulting in them not having a normal lifecycle.
Trivial information is anything that does not contribute to corporate knowledge. Unlike redundant and obsolete files, trivial documents never served an organizational purpose. Often this is personal data—music, pictures, non-work documents—that employees created on their work devices and saved to the company server.
There are risks in keeping ROT Files
Outside of upsetting Marie Kondo, hoarding ROT files can prove detrimental. File analysis and management for ROT reduction can lessen privacy compliance and cybersecurity burdens while optimizing overall data functionality.
Hidden in this digital trash is potentially non-compliant and private information that, if uncovered, could result in regulatory penalties. Privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), require that organizations safeguard any personal identifying information (PII) that could reveal a user’s identity. Traditional examples of PII include IP addresses, credit card information, and biometrics.
By and large, organizations have little use for personal information, since the potential for harm typically outweighs any business value it may yield. Organizations, ideally, want to delete this information; however, many companies cannot locate private information within ROT files. Ironically, to get rid of PII, organizations must highlight sensitive information that would have otherwise gone unnoticed. In order to isolate PII, organizations tend to rely on file analysis software that uses content and metadata analytics to flag private information for deletion.
In general, when it comes to data security, the more information you have, the harder it is to protect. This effect is particularly evident with ROT as these files are not meant to be stored in the first place and have outdated permissions and security policies. Malign actors or accidental employee mistakes could result in these security weaknesses being exploited for larger data breaches. Not only would clearing away ROT increase security by eliminating vulnerabilities, but it would also save security specialists time, allowing them to focus their efforts on more important business records. Accordingly, file analysis efforts to isolate insecure files are increasingly becoming a cornerstone of cybersecurity.
In terms of enterprise data functionality, lugging around ROT files can increase storage costs, slow down servers, and congest enterprise-wide searches. Given that such a large portion of enterprise data is ROT, these files can skyrocket storage costs—regardless of on-premises, cloud, or hybrid deployments. However, the storage method does matter in terms of functionality, as each differs in its innate scalability. In particular, on-premises systems are more prone to be bogged down by ROT given they have far less plasticity and have firm limits as to what they can and cannot handle. In terms of search, excessive ROT can cause a traffic jam in eDiscovery, forcing systems to go through each irrelevant ROT file to isolate pertinent documentation. Clearing out ROT with file analysis programs has a high return on investment, as the costs and labor of maintaining ROT is often more than what would be needed to cleanse servers.
Compounding privacy, security, and functionality concerns, maintaining ROT files is an onerous endeavor. In coming posts, we will explore how file analysis programs can move beyond risk reduction into being an asset as organizations leverage data for analytics.
Follow the rest of our file analysis series:
- File analysis for ROT reduction (this post)
- File analysis for corporate insights
- ZL File Analysis and Management