File Analysis

How to Cleanup File Share and Legacy Data

3 tips on how to cleanup files you don't need defensibly and comfortably

How to Cleanup File Share and Legacy Data Feature

Before we start, here's a short video with some advice on launching a data cleanup project. In this blog, we aim to provide readers with additional context and background information to help them cleanup file shares.

According to the IDC, the volume of data kept in the global storage sphere is doubling every four years. This exponential rise is attributed to higher storage costs, regulatory issues, and cybersecurity concerns associated with Redundant, Outdated, and Trivial (ROT) data. But that's not all.

According to estimates, ROT data accounts for 33% of all file data in a company. This amounts to 30% of an employee's workday spent searching for files within the organization. It corresponds to nearly one hour per day spent searching through ROT, making it one of the most significant contributors to employee productivity loss.

Overall, too much data, whether ROT or not, causes strains, and one way to remedy that is to cleanup file share and legacy data. Here's how!

Setting the Stage for a Cleanup Project

An organization must agree upon a few key details before sifting through data and separating the trash from the treasure:

  1. Who are the key stakeholders?
  2. Who will be the leader of the project?
  3. What are the main objectives of the file share cleanup project?

Remember, the outcome should include saving resources, increasing security, improving flexibility, and having sufficient data control to differentiate between key records and ROT data.

After giving these points some thought, it is crucial to have a clear idea of the data cleanup goals, as this will influence how the organization approaches and moves forward in the process.

Tip 1 Use Big Buckets

While going through corporate data, there are three types of data to consider: retention, deletion, and further investigation. Metadata and content must be examined to establish which bucket each file belongs in.

Metadata comprises information about the document's owner, the last time it was accessed, the date it was created, the file volume, and the data type. Conversely, content may detect duplication, separate personally identifiable information (PII), and classify record types.

The actual file share cleanup can only begin once the data has been appropriately split into the three buckets per the information governance standards.

Tip 2 Clean up file share and legacy data that you don't need

Further guidance on determining what can be safely discarded boils down to three core considerations:

Is your process defensible?

When discussing data deletion, the most critical aspect is ensuring that each decision is defensible. Regulators are primarily concerned with taking reasonable precautions to safeguard critical information and having data visibility to demonstrate it. Data cleanup is about putting that concern to rest by upping your best effort when maintaining records for privacy compliance and legal purposes. By organizing data by value, the bucket approach can assist in preventing risk.

What do your samples say?

To add to the overall defensibility of your legacy data cleanup project, it is advised that you sample your data at random to assess the effectiveness of your buckets and to make the sorting process easier. For example, sampling by folder, creator, or department can assist you in drawing conclusions about the other documents in the same grouping. After you have enough data in your buckets, you may sample what is in them to verify proper categorization.

Will this work in the long term?

File management solutions and file share cleanup efforts often are one-time events, and the mountains of data that inspired you to start the process are certain to reform over time. To avoid this cluttered loop in the future, you must build methods to automatically or manually categorize your data—ideally into these three buckets—for continuous records management and elimination.

At this point, you may be curious about what you should do with the third bucket requiring further investigation.

Tip 3 Only move what you are comfortable moving

The third bucket's purpose is to break it into broad slices of data and uncover information to assist in moving them into the deletion or retention buckets. This can be accomplished by sampling or manual identification; however, the key purpose is to determine the value of the slice rather than the document. Moving data from this bucket requires demonstrating purpose and best-effort compliance rather than correct categorization.

Final Words

It is OK for a few documents to be incorrectly retained or destroyed because the result will be better than the clutter. The idea is to establish a middle ground and document your decisions and why you made them so that you can subsequently defend your purpose if necessary.

Bivek Minj graduated from the Indian Institute of Mass Communication with a degree in English Journalism. He serves as a Content Writer at ZL Tech India's Marketing department. He comes to the industry with a desire to learn and grow.