3 Tips for File Share and Legacy Data Cleanup

Initial steps to starting a legacy data cleanup project

legacy data cleanup

We recently shared a short video detailing a few tips on how to start a data cleanup project. In this blog, we will provide additional background and context to further assist those starting a new file cleanup project.

Setting the Stage for a Cleanup Project

Before combing through the data and sorting between the trash and the treasure, there are a few things an organization has to align on first:

  • Who are the key stakeholders whose input needs to be garnered?
  • Who will be the leader of the project, owning it from ideation to completion?
  • What are the main objectives that should come from the file cleanup project?
    • Save resources?
    • Increase security?
    • Improve flexibility?
  • Do we have sufficient data control to differentiate between key records and ROT (redundant, obsolete, or trivial data)?

Once you have gone through these considerations, you should have a decent idea of what you hope to achieve from your data cleanup, what the main obstacles will be, and what you will need to fill in the technological gaps. These early decisions will shape your strategy and guide you throughout the process.

Tip #1 Use Big Buckets

When initially going through your enterprise data, you will run into three classifications—or buckets—of data.

  1. Retention: There is enough information to know that you want to keep this.

  2. Deletion: There is enough information to know you do not want this.

  3. Further Investigation: There is not enough information to decide either way.

To determine which bucket each file belongs in, you will have to look at its metadata and content. Regarding metadata, there are a few key details that will prove crucial when determining a file’s importance: document owner, last accessed date, date of creation, file volume and data type (e.g., video, picture, audio, or text file). If metadata alone is insufficient to determine the file’s bucket, its content can finish the picture by identifying duplicates, isolating personally identifiable information (PII privacy, and classifying records type (e.g., contract, receipt, or junk mail).

Once your data is neatly separated into the three buckets, the actual cleanup can commence.

Tip #2 Clean up legacy data and files that you don’t need

Further guidance on determining what can be safely discarded boils down to three core considerations:

  1. Is your process defensible?

    When discussing data deletion, the most important aspect is ensuring that each decision is defensible. What regulators are really concerned about when it comes to document preservation is whether or not you have taken reasonable measures to protect crucial information—and if you have the data visibility to prove it. In the case of data cleanup, it is about showing best effort when preserving documents for compliance and legal. If you have done the bucket method, you should avoid most liability and risk since you have already sorted your data by value.

  2. What do your samples say?

    To add to the overall defensibility of your legacy data cleanup project, it is recommended that you randomly sample your data to test both the efficacy of your buckets and ease the sorting process. For example, sampling by folder, creator, or department can help you make inferences about the rest of the documents in the same grouping. Then, once your buckets have enough data, you can sample what is in them to ensure that your classifications have been accurate.

  3. Will this work in the long term?

    Too often, file management solutions and cleanup efforts have been done on a one-off basis, and the piles of data that prompted you to initiate the project are bound to reform over time. To avoid this cluttered cycle, you need to establish processes going forward to help automatically or manually categorize your data—ideally into these same three buckets—for ongoing management and deletion.

Tip #3 Only move what you are comfortable moving

At this point, you may be curious as to what you should do with the third bucket that required further investigation. The goal with this bucket is to find ways to separate it into broad slices of data and find information to help you move those slices into the deletion or retention buckets. Notably, we can do this through sampling or manual identification, but keep in mind that the main goal is identifying the worth of the slice, not the document.

Defensibly moving data from this bucket does not require your classifications to be 100% accurate, rather it requires you to prove your intent and best effort compliance. It is okay for a few documents to be erroneously retained or deleted; After all, whatever your outcome is, it will be better than the clutter you began with. There will always be competing risks between retaining documents too long or deleting documents too soon, the key is to find the middle ground and denote the decisions you make and why, so that you can later prove your intent if needed.

Hopefully, these tips can help to simplify your next cleanup project, but if you still have questions or would like to know more about information governance solutions, please email us at

Rafael Walden is a graduate of Portland State University and current solutions consultant at ZL Tech.