The Afterthought of Governance in Big Data

As companies move through analytics maturation, take advantage of big data, and begin to understand the value of available data sets, they often forget to verify the accuracy and validity of the data itself. It appears governance is the last piece of the big data piece puzzle being applied, often due to lack of allocated resources. However, treating governance as an afterthought can lead to major problems. Let’s investigate why this may be the case.
Stakeholders vs. Analysts 
Oftentimes, the want of delivering any data to stakeholders up above outweighs the consequences of verifying the accuracy of what is being delivered. The issue is rooted in the fact that the stakeholders receiving the data don’t know the tool sets that were used to report the data in the first place. Ultimately, they don’t know enough about where the data comes from in order to determine accuracy. Stakeholders have to take the word of the analyst who pulled the data, but to whom does the analyst trust to insure the data is accurate? Unfortunately, in most cases, it’s the analyst themselves that are providing the checks and balances. 
This is an issue far greater than data sets counting metrics or dimensions differently than the tool in which the data set is being applied. This problem rests solely on the shoulders of human resources, allocation of time, and the money being applied to insure the data is not only accurate, but also shows the true data. Without the allocation of applicable resources, it provides the perfect outlet for human error.
Bad Data Can Be a Big Problem 
One workaround which is both efficient and cost-saving is to remove data once it has been collected for a predetermined amount of time. In most cases, data remains in a database anywhere from one month to two years. However, once the data is removed, the potential value of that data gets removed as well. A common mistake is assuming your data means nothing today when in fact it may hold value tomorrow. So, you may be erasing data for no reason at all. 
If a company is going to choose to erase or eliminate prior data, it should be done under an outlined retention schedule. This method provides any company a clearer path to remove data that may not be relevant anymore, while allowing for a vetting of the data by potential stakeholders before it’s eliminated. This is clearly a more labor intensive exercise when compared to just erasing data once the data gets past a certain date. However it’s a solution that insures you aren’t aimlessly erasing potential data of value. 
The numbers don’t lie; bugs or issues in the data are much easier to fix before it goes live. Therefore, it’s not just potentially eliminating old data on the back end, it’s insuring the bad data doesn’t get there to begin with. Once the data goes live, it’s ten times more costly to fix -- if it gets fixed at all. The output cost of bad data can be illustrated by the 1-10-100 rule. Estimates suggest that if the cost to fix a data error at the time of entry is $1, the cost to fix it an hour after it has been entered escalates to $10. Fix it several months later and you’re looking at over $100. Is it worth it? 
Governance as a Holistic Approach
One thing to take into consideration is building compliance into your day-to-day processes so nothing gets lost in the shuffle. Another solution? Thoroughly educate the masses on the importance of data accuracy -- most specifically, the developers who are creating analytics code. 
However, developing a mature governance process needs to involve all branches of the company, while adapting to particular needs along the way. A one size fits all approach will not work when attempting to rollout proper data governance. 
The central element that holds governance together is the development of a governance body that maintains compliance across the company. This is the driving point to maintaining good data before and after the data collection has begun, rooted in a fundamental alignment of data governance with core business philosophies. Bottom line: If a company is willing to spend to get at big data, they better be willing to governance the big data as well. 
What do you think? Does your organization use governance as you move through analytics maturation? For more tips, reach out to us at
By Kyle Westgate
About the Author:

Kyle Westgate is Manager, PMO at Stratigent.

Contact Us Now