The terms structured and unstructured data are self-explanatory enough when you have a bit of context.
Structured data, at a broad level, refers to data that is organized into intentional categories. HR departments with database tables on employees’ personal information (date of birth, start date, salary, etc.) are dealing with structured data. Think anything that would logically fit into an Excel spreadsheet.
Unstructured data, by contrast, is all the more “free-flowing” data out there: emails, IM messages, videos, explanatory documents, etc. While files and emails have lots of metadata attached (sent/last modified date, sender/author, etc.) that might qualify as structured data, the content itself cannot be organized by columns and rows.
Intuitively, you might think unstructured data would be more difficult to manage since the format and sizes are unpredictable. However, archiving structured data held in old databases poses a unique set of challenges.
The biggest challenge is trying to follow the trail of related information. Large organizations may have thousands of databases, many of which work together to provide access to enterprise information. For example, one table might have employee names and their insurance ID numbers. Clicking on the ID numbers may take you to another table, which attaches these numbers to additional information like insurance providers and the name of your doctor.
If this information was consolidated in individual documents for each employee, a standard search with the employee’s name would quickly retrieve all the relevant data. In the database scenario, however, the search/analytics tool would have to know to connect the employee’s name with their ID number, and subsequently their provider and doctor’s information, to bring up all of the information you would need.
Another (related) challenge is creating logical groupings of data. Whereas files and emails are easily distinguished as individual entities, isolating certain rows from databases requires a more conscious tool design. Retaining certain parts of a database table, as opposed to operating at a table level, would obviously be preferable to having to store duplicates of entire tables across legal cases for a small subset of information. While managing different data types poses a challenge, structured and unstructured data work together in all large corporations. It’s important, therefore, to have the tools to make the most of both.