Working in Silicon Valley, it’s rare to go a day without reading, hearing, or talking about artificial intelligence. From self-driving cars and a Jeopardy-winning machine to customer experience improvement, it seems to be everywhere... except information governance.
I’d argue the lack of AI in the information governance realm is a good thing. While the combination may seem attractive at first glance, it’s fraught with potential complications. Not only would implementation require extreme capital investment, the sheer number and frequency of regulatory changes could become nigh impossible to defensibly keep up with. While organizations may find nuanced use cases for AI with information governance, they should not seek it out in order to replace records managers, general counsel, or compliance professionals’ job functions.
Considering that much of today’s AI is based on neuroscience findings, it’s no surprise that accurately training systems to handle data can be extremely complex. Although extraordinary breakthroughs have been made in recent years, data bias is still a significant issue for AI decision making.
To properly train data, the sample must first be proportionate to the population. The system must then be taught to make decisions using this sample set. Conclusions are then checked against a pre-determined set until the AI’s decision making is consistently accurate. Unfortunately, the use of existing data sets can insert bias into the process—for instance, using historic results to train AI for court sentencing would lead the program to absorb past judges’ own bias and prejudices. Yet creating all new sets of ‘clean’ data would not only pose a significant cost burden, it would merely switch the bias to those creating that set.
AI cannot completely remove bias or human error so long as it is trained by humans. Nor can it bring the meticulous attention to detail required in information governance when new regulations pose enormous data training burdens.
Structured vs. Unstructured Data
To understand the root of this challenge, it’s important to recognize the difference between structured and unstructured data. Structured data is all organized in one specific way. This makes training AI to process it a fairly efficient process. Sure, you need the data sets and algorithms to set everything up, but the decisions themselves aren’t too complicated.
Here’s how it works. My profile is saved in a database. The program will first check if I am male. When proven false, it will check if I am female, confirm, then move on to the next if-then statement. Machine learning uses these “families of boosted decision trees” to run through thousands of possible iterations as to my identity before coming to a conclusion. This type of technology has led to programs correctly answering everything from Jeopardy questions to profile identification.
Unstructured data, however, is not so simple to process or train. While important records may become structured (i.e. employee information input into a payroll database), the vast majority is not. Employee-created data is complex, ranging from files and SharePoint sites to emails and IMs. Unlike structured data, which can be processed with simple if-then statements, this content has no inherent rules or structure embedded within it. This is the primary challenge with training a machine to take over records, compliance, or eDiscovery functionality.
The AI Challenge
Not only would creating appropriate training data sets for each new regulations pose a logical and financial challenge, it would still have gaps in handling unstructured data. Those gaps can only be filled by an effective governance solution and expert personnel. Training a machine to understand known rules—and update its knowledge when new regulations or business use cases occur—would take years and still require supervision. At this point, what is the benefit over traditional governance structures?
Considering the conflict between regulations such as FINRA, HIPAA, and GDPR, I understand it may be tempting to foist decision-making responsibility onto AI, but that will not be legally sufficient. AI software is currently unable to explain its conclusions, something which may be required when a conflict between these regulations occurs. Furthermore, GDPR requirements mandate that organizations be able to fully explain how all personal data is being used. With current AI technology, that is not possible.
While I agree AI could be useful for information governance professionals, especially in regards to structured data, I do not believe the technology is ready to face these challenges. But even when it is, information governance will always require human monitoring and expertise.