Cleaning Up and Managing File Share Data

Company file shares and SharePoint sites have accumulated massive data volumes over the past several decades. While data continues to grow exponentially, very little has been done to manage or tame it. However, several factors have converged in recent years to create an urgent need for companies to address this seemingly overwhelming challenge.

Key Reasons for File Share Cleanup

Risk Management

File shares inherently hold large volumes of sensitive data, which by keeping can increase exposure to regulatory fines, cybersecurity breaches and associated risks, as well as legal risks in the event of electronic discovery. Cleaning up data over time significantly reduces all of these risks.


Privacy regulations such as GDPR and CCPA mandate that companies find and remediate personal data, on both an ongoing basis and as requested via data subject access requests. At the risk of fines up to $20M or 4% of total sales, companies must dispose of personal data once it’s no longer needed for its original business purpose, or when requested by a data subject. This is a tall order when this personal data could be lying anywhere, in an ocean of unstructured data scattered across dozens of data silos, and companies have limited capability to search and remediate this data.

Storage Costs

As data grows faster and faster, storage costs become a significant factor, particularly for cloud-based data. With estimates that up to 80% of data may be ROT, large companies typically pay millions of dollars in unneeded storage.

Move to the Cloud

As companies migrate more of their file sharing systems to cloud applications such as SharePoint, there is a need to clean up data before they move—particularly knowing that they will face new subscription and storage overage fees in the cloud.

Records Management

File shares typically hold important business records created by employees, which require retention. The capability to classify these documents and apply a retention schedule is a key need held at most large companies.

Unified Governance

Unstructured data is also required by other governance functions such as eDiscovery—the legal requirement to search and collect electronic evidence during legal disputes. Governance functions such as eDiscovery, records management, compliance and privacy need to operate in synergy to ensure the appropriate policies take precedent. For example, what happens if a company is requested to delete several documents via a data subject access request, however those documents are being retained for records management or legal purposes? Each of these functions must be integrated.


As companies seek to implement GenAI for business insights and intelligence, unstructured data found in file shares represents the repository of corporate knowledge. The capability to manage, search, cull, and feed this data into AI and analytics applications is a strategic requirement for enterprises.

Critical Capabilities of a File Share Cleanup and Management Solution

In selecting and implementing a solution to clean up and manage file share data, there are a handful of critical capabilities that should be weighed. While there are several available tools, very few of them are designed to holistically meet the complex requirements of large enterprises.

Defensible Deletion

Identifying and defensibly disposing of ROT

As companies accumulate more and more data, keeping everything becomes less viable. At least a portion of total data in file shares is ROT and should be disposed of, but how can this data be identified and disposed of defensibly? Doing so requires three components:

  • A clear policy that defines when data can be deleted, in compliance with all regulatory and business requirements
  • Consistent application of that policy, with supervised disposition of documents in regular intervals
  • Audit trails to demonstrate which documents were deleted and why

Organizations may wish to evaluate solutions that are capable of enforcing defensible deletion through the implementation of tools such as a master retention schedule, automated, manual, and hybrid records classification, as well as PII identification and remediation. Following this approach minimizes the risk of spoliation and adverse inference in litigation.

Learn More

Unified Governance

An integrated system that draws from every governance function

Large companies today have a need for holistic information governance, combining capabilities from records management, regulatory compliance, eDiscovery, and privacy, within a single platform. For example, perhaps certain business records may be deleted 7 years after being created. Other types of records may require more granular controls, such as “delete document 10 years after the contract expires,” or “10 years after the employee leaves the company.” These are referred to as events-based triggers.

However, the Record Schedule is just one of the pieces that determines whether a document can be deleted. If it is “on-hold” for a legal matter, it must be preserved despite its record policy. Similarly, if it is being retained for a regulatory requirement, that may also supersede the record policy. In another instance, if a document has personal data, and it is requested to be deleted per a GDPR request, before deleting data, the system needs to verify whether it is being retained for other purposes. Thus, there is the need for an integrated system that draws from each governance function to form the basis of lifecycle management.

Learn More

Global Content Search

Full-content indexing to power analytics and AI

Global Content Search is required for a handful of key requirements, such as data subject access requests, records searches, and eDiscovery. However, many file share cleanup tools tag data at the point of ingestion, but do not create a full-text index. As a result, if a company wishes to search for content or reclassify documents based on new content policies, they are placed in a difficult position.

Content Search is also leveraged for more advanced use cases, such as pattern-recognition to identify documents with personal data. If a tool only performs content analysis at the point of ingestion, new patterns and policies are difficult to implement. Full-content indexing is a critical requirement for many governance functions, as well as for companies seeking to leverage data for analytics and AI.

Learn More

In-Place Management + Archiving

Holistic information governance

The majority of data can be managed “in-place,” without creating any document copies. This is ideal for 80% of data that is low-to-medium value, with minimal risk of deletion. However, certain documents such as business records and documents placed on legal hold, may need to be archived. Therefore, the capability for in-place management in conjunction with selective archiving is essential to holistic information governance.

Learn More