Data Governance: Managing and Safeguarding Important Information Assets

Posted by Mark Greisiger

A Q&A with Tom Preece of Rational Enterprise

Many data breach events are at least partly the result of poor data governance: organizations that don’t maintain a data inventory or map. Without such oversight, the inevitable breach event can become all the more devastating. I spoke with Tom Preece of Rational Enterprise about what organizations can do to gain control over their data.

Can you speak to how a loss of control over sensitive data can lead to risk?
There are many risks associated with lack of control over enterprise content, but certainly the most prevalent concerns are those around data privacy and security risks. First, it’s difficult to protect data the organization doesn’t know exists. The lack of visibility of content is a fundamental problem facing many companies today. Sprawling file shares and legacy data stores are obvious examples, but plenty of unstructured data residing on employee PCs is also generally “dark.” If sensitive content is present in any of those systems unbeknownst to the enterprise, proper protections can’t be put in place.

Second, it’s difficult to protect data that’s stored in inherently insecure locations. For example, a portable device like an employee laptop is more likely to be lost, stolen, or hacked than a partitioned file share. Companies that willingly export data during litigation to an externally hosted review repository are also placing some of the most important enterprise data at risk, as these repositories often have vastly inadequate security protection.

an enterprise that doesn’t remediate redundant, obsolete, and trivial (ROT) data is unnecessarily exposing itself

Similarly, it’s difficult to protect data that’s not organized, retained, and disposed of according to well-defined policies. We’re seeing increasing regulation about proper protection for Personally Identifiable Information (PII), and retaining data backups in perpetuity and that’s a concern for organizations. For instance, an audit that uncovers unmanaged PII on unstructured systems could cause serious repercussions. Moreover, an enterprise that doesn’t remediate redundant, obsolete, and trivial (ROT) data is unnecessarily exposing itself to risk in the event of a litigation or investigation.

An unprotected data store is vulnerable not only to bad actors (external or internal), but also to well-intentioned employees who may accidentally expose, destroy, or lose that data. Inadvertent disclosure of any kind can result in economic hardship (e.g., loss of intellectual property, fines, etc.); reputational harm; investigation or supervision by federal regulatory or law enforcement agencies; investigation or enforcement actions by State Attorneys General, Congressional or international investigations; and civil litigation (brought by shareholders, employees, customers, etc.).

How might a company start the process of identifying all of its PII data?
Traditionally, enterprises have undertaken PII (and other sensitive data) identification through the creation of a data map, which theoretically has some benefits. A data map consists of a high-level understanding of the flow and locations of different types of information throughout the enterprise. However, it’s naive to think that an organization can create an accurate data map based solely on company documentation of responsibilities and access because this view assumes (usually incorrectly) that all policies and procedures are actually being followed. Custodial interviews (i.e., asking employees directly “what kind of data do you create?” and “where do you store it?”) fills in some of the gaps, but becomes very time consuming. Thus, as soon as the sizeable project of consolidating company documentation, scheduling and executing interviews, compensating for the differences in those two accounts, and creating a final report is finished, the company’s data stores have likely already changed significantly. The type of data employees create and where they store it is so fluid that the data map would require constant updating. The process also often fails to take into account lost institutional knowledge about sensitive content from years past. Due to these shortfalls, traditional data mapping is no longer advisable.

A more modern approach to a static data map is to index enterprise data in place, creating an incredibly dynamic audit trail that allows the data to speak for itself. This approach quickly illuminates data and processes that interviews never could. It also eliminates the bias of employees who don’t report their own failures to follow policy. Once documents are indexed, the enterprise will have the ability to run analytics (from basic Boolean search, to Regular Expression, to more advanced Support Vector machine learning) across the information. Using an agent-based software like Rational Governance, a company can capture changes or new documents in real time, so the audit trail of enterprise information is always up to date. The key first step in having control over data is simply gaining insight into what data exists in a dependable way.

How can IT administrators better control the classification, access, movement, and deletion of their data?
While being able to index data in place is an important first step, Rational Governance was also created to enforce explicit control over content based on company policy. We knew that classification—and automated fulfillment of policy based on that classification—would be equally as important as visibility. Analytics combined with policy automation avoids the reliance on end users to classify and manage information. Similarly, insight and classification without the integrated ability to move, delete, and retain data places too much onus on IT, as there’s simply too much content to manually control. Enterprises should be using tools that accomplish this automation, coupled with sophisticated classification analytics to identify and control data, sensitive or otherwise.

Any words of advice to organizations looking to adopt a more comprehensive data governance program?
Poor data governance is a people, process, and technology problem. An organization’s records, compliance, and IT managers will need the right institutional awareness of the depth of the problem before winning executive buy-in. Processes for classifying and controlling information are doomed to failure if they don’t respect and take into the account the business goals of a company and acknowledge the existing culture around managing information. This balance requires leaders with the subject-matter expertise for putting the right policies and procedures in place, and also a thorough understanding of the needs and culture of the business.

Those who should be leading the charge often fail to make an effective case for obtaining technological insight into and control over unstructured data stores because they don’t have the metrics to prove that the risk exists, or to what extent. Without being able to quantify the risk, business leaders are not willing to assign funds to remediate. Undertake a proof of concept of Information Governance (IG) technology. Document anything that will build a case for better processes through technology built for proactive information management and data privacy/security.

In summary…
We want to thank Mr. Preece for his insights into this topic. It has been our experience that most organizations are lacking an inventory mapping the type(s), volume and location(s) of sensitive data. This issue is only growing with the expansion of outsourcing, and we see the risk point underscored in several cyber risk/crime studies that indicate that many companies sustaining security breaches are struggling to name or quantify or locate data types and systems, thereby resulting in too many “unknown unknowns.” Having a strong inventory from the outset would save these organizations much time, money and angst.