Journaling vs. Crawling Enterprise Data

Comparing the efficacy and use cases of journaling vs. crawling archival strategies

journaling vs archiving

Why do organizations archive their data?

Archiving is the primary approach organizations use to maintain business records. With changes in the regulatory landscape and advancements in analytics technology, information management is increasingly being adopted and prioritized. Industry standards such as SEC, FINRA, SOX, HIPPA, and FRCP compliance require some degree of email and communication archiving. In effect, archiving locks items in an immutable way to ensure they are safe from deletion and edits throughout their lifecycles. There are two way to capture data: journaling vs. crawling.

What is journaling?

Journaling is one means of capturing data—typically emails—for archiving. Its main component is a dedicated mailbox, also called the journaling mailbox. All circulating items within the environment are automatically forwarded to this mailbox as they are being sent. As a corollary, all data is captured in its original, unadulterated form. These items are periodically carried over to the dedicated archive. While journaling is conventionally used for emails, it can be utilized in any communication platform that can auto-forward items as they are being sent.

Benefits of journaling

Journaling is the only approach to archiving that captures content without any risk of alteration. Journaled items clearly demonstrate chain-of-custody: items cannot be edited by recipients, as the original data is captured before it reaches the recipient mailbox. Journaling is also the easiest means of capturing enterprise emails.

Weaknesses of journaling

Journaling does not give insights into how your workforce interacts with the circulating items. For example, it does not capture calendar data, such as whether a meeting invite was opened or deleted. Therefore, despite being the most defensible form of data capture, journaling cannot ensure full capture of enterprise data. Additionally, journaling relies on auto-forwarding information and any platform without such functionality cannot journal.

What is crawling?

When comparing journaling vs. crawling, journaling captures data instantly, whereas crawling happens sporadically. Aptly named, crawling “crawls” platforms in search of new information to ingest into the archive. To that end, crawling software has the difficult task of keeping track of what has and has not been added to the archive. Solving this problem, crawling programs typically rely on metadata to denote new additions to bring into the archive. Given that these programs have to cull through massive datasets each time, crawling is a far more arduous process than journaling.

Benefits of crawling

Crawling provides two core benefits: it captures interaction data such as mail opens and deletions and works in almost all platforms. Crawling gives more insight into how your workforce interacts with items in their mailboxes and with each other because it captures data throughout the item’s history—instead of only upon creation. Accordingly, unsent items, such as calendar data, folder structure and drafted emails, can only be captured through crawling.

Weaknesses of crawling

Since crawling is done periodically, it won’t be able to capture all the changes that take place within a mailbox. If any change to the original source is made in between crawling periods, the archiving software will only capture the most recent version—leaving potentially relevant drafts uncaptured. Also, since the items can be edited inside mailboxes, it does not ensure the level of chain of custody that journaling offers.

Journaling vs. crawling: Which is better?

While cliched, it all comes down to the particular use case. In general, journaling is considered more defensible than crawling. Journaling is deemed preferable, when possible, for proving chain-of-custody and regulatory compliance.

However, any platform that does not support automatic forwarding requires crawling. Therefore, most instant messaging and collaboration platforms require crawling for their data capture and archiving. Furthermore, crawling captures actions taken on items, giving insights into how your workforce is cooperating and performing.

While it is often seen in the binary, journaling vs. crawling, organizations frequently use them in tandem. By integrating the methods together, organizations can capture the entirety of enterprise data, thereby increasing their overall defensibility.

After growing up in Turkey, Kaan moved to Maine to complete his undergraduate studies in mathematics at Bowdoin College. He promptly left the snow behind him to live out his data governance goal in the Californian sun at ZL Tech.