Data Collection Part 2 - Hybrid Methodologies

Over the last 5 years, the web analytics industry has made significant advances in its data collection techniques. Web analytics vendors as well as individuals and businesses using web analytics strive to obtain the most useful and impacting data in a manner that is the most cost and time efficient. At the inception of web analytics, log files were the only available data source. Today, page tagging as well as network data collection are also very valuable sources of information. To review definitions as well as additional information on the page tagging, log files, and network data collection, view Data Collection Part 1 - Single Methodologies.

To overcome certain obstacles that can be caused by each of the data collection techniques, many organizations have implemented a hybrid solution. Hybrid data collection solutions have the potential to leverage the strengths of each technique while minimizing the weaknesses. Below is a review of the following four main hybrid solutions:

Case 1: Logs and Tags

Using log files and page tags is possibly the most commonly used hybrid solution today. It offers the best of both solutions and limits the disadvantages. Specifically, using both solutions, the following can be tracked:
  • Usage patterns from spiders (via the log files)
  • Complete download data (via the log files)
  • Error code data (via the log files)
  • Behavioral data (via page tags)
  • Specific data elements defined by the company (via page tags)
  • Cached pages (via page tags)

This hybrid solution will require additional resources including additional configuration, expertise, and people. Additionally, the proper configuration of this hybrid solution is essential. It is crucially important that the software or tool is setup in such a way that there is a process for ensuring that the data captured is not duplicated. For example, data counted by the tag is not duplicated by the log file for the same request.

Case 2: Logs and Network Data Collection

Of the four hybrid solutions, this solution is probably the least commonly used since it is not the most appropriate hybrid solution for most companies. Network data collection and logs have similar advantages and disadvantages so the additional resources needed to execute this model may not seem practical. Also, as in Case 1 above, additional resources would be required, higher expenses would be incurred. Data duplication is again necessary to avoid.
 

Case 3: Network Data Collection and Tags

Using network data collection and page tagging as a hybrid solution offers an abundance of useful and rich data. Similar to Case 1, this hybrid solution optimizes most of the advantages of both solutions while minimizing the disadvantages. Both Case 1 and Case 3 benefit in the same way from the use of page tagging; however, in Case 3, the use of network data is superior in most circumstances to the use of log files. Specifically, network data has the following advantages:

Network Level Data:
Network data collection provides access to a more granular level of technical data that can be used to determine server response times to requests and identify network related issues that could be interfering with user experience.

Data Consolidation:
Often, network data collection simplifies the process of consolidating and combining data from many servers which is common to log files.

Additional Application Data:
Some network data collectors are capable of collection application server variables and other additional fields of data that are not captured in log files and would be difficult or impossible to capture with page tags.

First Time Visit Cookie Setting:
Some network data collectors are capable of setting a visitor identification cookie which is a superior method of setting this cookie as the first request the web server sees from a new visitor will not have the appropriate visitor identification cookie on it.

Search Engine Spider Reporting:
Knowing the usage patterns of spiders can be valuable when engaging in search engine optimization. This data can be utilized to optimize the technology and content of the site for those spiders.

Complete download data:
Log files make it possible to calculate the amount of downloads for files that are successfully completed vs. downloads that were not fully completed.

Server Error Code Reporting:
Error code data is automatically recorded in most log files and can provide valuable information into site functionality and design issues that would be difficult to detect through other means.

Not all data collectors are able to collect all of the above information. However, the most recently developed data collectors have made many new advances and are able to collect the data listed.

Many companies would likely yield the highest return on investment by utilizing a network data collection and page tagging hybrid solution. As in all hybrid cases, the implementation and maintenance must be carefully monitored.
 

Case 4: Network Data Collection, Tags, and Logs

This scenario obviously offers the most complex and expensive solution by implementing all 3 data collection techniques. This solution, again, is not one that will often be optimal for most companies. However, Stratigent has worked with clients who chose to implement this solution due to difficulties implementing network data collectors on parts of their sites. Therefore, relying on log files for those pages but implementing the sensors wherever else possible while utilizing page tags across the entire set of sites was a best case solution. If resources permit, implementing all three data collection techniques does produce the most robust and encompassing data.

Recently, a number of web analytics vendors have presented innovative ways to collect data from the network. In particular, ClickStream Technologies has created some interesting technology that allows for the combination of network data collections and tagging in such a way that the tagging is done in a highly automated way. Additionally, Visual Sciences has been offering creative solutions which incorporate network data collection for some time now. It can be expected that more vendors will be adopting this option of data collection in the near future and developing their own approaches.
 
For each hybrid data collection solution, careful consideration must be given to the goals of the data collection and the resources allotted to accomplish those goals. Implementing a hybrid solution can be difficult and prone to data errors. However, if implemented, maintained and analyzed correctly the hybrid solutions offer a whole new level in the understanding of not only user behavior but also a site's performance and functionality which of course has the potential to lead to a highly efficient and optimal website.
 
Josh Manion
Josh Manion
Chief Executive Officer
Stratigent, LLC
 For more information please call 877-427-2900 or email info@stratigent.com.