Data Quality Erosion

This month’s topic may affect the long term quality and accuracy of data within any web analytics program regardless of whether it is software or ASP based and independent of the method of data collection (log files, page tags, network, or hybrid) being leveraged. The term I like to use to describe this phenomenon is Data Quality Erosion, which I have defined below:

“Data Quality Erosion is the decay in the quality of a web analytics data set when the web analytics implementation and the website are not proactively managed.”

In the remainder of this newsletter, I will address the primary causes and effects of Data Quality Erosion and some specific strategies to prevent it from affecting your web analytics data.

Primary Causes / Effects

  • Web sites can change dramatically overtime, however the web analytics implementation is often not modified accordingly. This is often the result of the organization lacking a clearly defined business processes that would otherwise ensure that data quality and web analytics data collection keep pace with site changes. Common examples of this can be as dramatic as new site redesigns or as subtle as URL’s of products or services containing the year (Example: Consequently, when the year changes many of the reports within an analytics application can be effected.
  • The initial implementation of web analytics tools is often incomplete and not robustly tested or Q&A’ed. This can easily result in a situation where data quality is less than optimal from the launch and degrades over time. Take for example, an environment where a page tag-based data collection process is implemented. If the portions of the site that do not include a global header or footer are not tagged then the resulting data would be incomplete. Without a rigorous Q&A, users of the data may not even realize the data has been compromised.

Filtering requirements can change over time. Examples include:

  • Internal IP addresses used to filter internal traffic are often times added or changed.
  • Robots and spiders change over time. It is important to be filtering the most recent list of robots and spiders.
  • Network testing or monitoring applications that can add non-human traffic may be added.

Best Practices to Prevent Data Quality Erosion
Like many best practices within the web analytics industry, the most effective approach for minimizing the potential for data quality erosion is to adopt a proactive strategy to preempt it. A cornerstone of this is to adopt a proactive approach to maintaining your web analytics implementation. As part of this process you should plan to regularly conduct the following:

  • Audit the configuration of your web analytics tool to ensure the quality of mission critical reports within the system.
  • If your data collection methodology involves page tagging, regularly scan your site with tools like Maxamine to ensure uniform tag deployment and to identify areas where data collection has eroded.
  • Consistently review your organization’s filtering to ensure that it continues to meet business objectives.
  • Conduct a review of recent data to identify data quality issues within the dataset.

In addition to the creation of a proactive maintenance program for your web analytics system, it is critical to develop and integrate the appropriate business processes into your organizations web development process to ensure that web analytics requirements and the appropriate data collection is captured as part of the new project development process.

By implementing the recommendations above you can ensure the long term quality of your web analytics data and focus your efforts on extracting value from your web analytics data as opposed to spending cycles explaining data issues to the consumers of web analytics data within your organization.


By Bill Bruno
About the Author:

Bill Bruno is the CEO - North America, Ebiquity.

Contact Us Now