Why Your Website Statistics Reports Are Wrong, Part 1

Key Takeaways

Server-side data collection is an effective method for tracking every file request on a website, but it has limitations such as difficulty in identifying users sharing the same IP address and user agent string, and inability to record cached files.
Applications such as AWstats can analyze server log files to produce meaningful figures, however, they must make assumptions when interpreting the data, which can lead to discrepancies in the reported statistics.
Improving the accuracy of website statistics involves steps like ensuring correct installation of tracking codes on all pages, using a reliable analytics tool, regularly auditing the website analytics setup, and filtering out bot traffic, spam, and internal traffic from the data.

web site statistics pie chart Marketing departments love website statistics. There’s nothing better than handing a CEO a freshly generated report showing how their website traffic is growing. That’s when the trouble starts.

Many people are under the misconception that web statistics are absolutely irrefutable: the numbers are generated by independent computers and cannot possibly be wrong. So why do statistics from two or more sources rarely match? To understand the problem, we need to examine the two methods used to collate and analyze statistics. Today, we look at server-side methods…

Server-Side Data Collection and Analysis

Every time you visit a website, the web server records information about every file request, i.e. the HTML file, CSS files, JavaScript files, graphic files, Flash movies, PDF documents, MP3 music, etc. Implementations differ, but most servers record each request on a single line of a log file. The data normally includes:

the request type — normally GET or POST
the full path name of the requested file
the date and time
the requester’s IP address, and
the requester’s user agent; a string of characters which identifies the device, i.e. a specific OS and browser or a search engine bot.

Understandably, log files can grow to hundreds of megabytes even on relatively quiet websites.

The main benefits of server-based data collection is that it records every file request regardless of the technology used. It’s easy to assess the popularity of downloads or discover performance bottlenecks. Most servers produce log files by default, so it may be possible to access historical information about your traffic growth.

Unfortunately, there are a number of drawbacks:

Very little can be determined about the user’s browsing device. User agent strings offer minimal information and can be faked (Opera used to pretend to be IE to ensure sites did not block the browser). You cannot normally assess the user’s screen resolution settings or whether they had JavaScript and Flash enabled.
Large organizations often pass all internet requests through a single gateway. User identification becomes difficult when two or more users are sharing the same IP address and user agent string.
The server logs cannot record cached files. Caching is essential and the Internet would grind to a halt without it. Your browser will cache files so, when you return to a page, it will show the files that were downloaded previously.
In addition, many ISPs cache popular website files on proxy servers. When you enter a web address, you may see files returned from that proxy rather than the originating website. As your site increases in popularity, you could even experience a drop in file access as more proxy servers cache your site.

Applications such as AWstats can analyze server log files to produce meaningful figures such as the number of unique users or visits. However, these applications must make assumptions when they interpret the data.

For example, the application could define a single “visitor session” as access from the same IP/user agent within the same 15 minute period. A user who visits a page then waits 16 minutes before clicking a link elsewhere would be recorded as two individual visitor sessions. But an application which assumed a 20 minute period of inactivity would only record only one visitor session.

If server-side data data collection and analysis is flawed, can client-side methods help us? View part 2 now…

Frequently Asked Questions about Website Statistics and Reports

Why are my website statistics and reports showing incorrect data?

There could be several reasons why your website statistics and reports are showing incorrect data. One of the most common reasons is the improper implementation of tracking codes. If the tracking codes are not correctly installed on all pages of your website, it can lead to inaccurate data collection. Another reason could be the use of different analytics tools which may use different methodologies for data collection and analysis, leading to discrepancies in the data. Also, issues like bot traffic, spam, and internal traffic can distort your data. It’s crucial to regularly audit your website analytics setup to ensure accurate data collection and reporting.

How can I identify misleading statistics in my website reports?

Identifying misleading statistics requires a keen understanding of your website’s data and the factors that can influence it. Look for sudden, unexplained spikes or drops in your data, which could indicate a problem. Also, consider the source of your traffic. If a significant portion of your traffic is coming from a single source or location, it could be a sign of spam or bot traffic. Additionally, compare your website data with industry benchmarks to identify any discrepancies.

What are some common misuses of website statistics?

Some common misuses of website statistics include cherry-picking data, ignoring the margin of error, not considering the sample size, and misunderstanding the correlation and causation. Cherry-picking data involves selecting only the data that supports your hypothesis while ignoring the rest. Ignoring the margin of error can lead to inaccurate conclusions, as it doesn’t account for the variability in the data. Misunderstanding the correlation and causation can lead to false conclusions about the relationship between different variables.

How can I improve the accuracy of my website statistics?

Improving the accuracy of your website statistics involves several steps. First, ensure that your tracking codes are correctly installed on all pages of your website. Use a single, reliable analytics tool for data collection and analysis. Regularly audit your website analytics setup to identify and fix any issues. Filter out bot traffic, spam, and internal traffic from your data. Also, use a large enough sample size for your data analysis to reduce the margin of error.

Why are my website statistics different from my competitors’?

Your website statistics could be different from your competitors’ due to several factors. These include differences in website design, content, SEO strategies, target audience, and marketing efforts. Also, if you and your competitors are using different analytics tools, it could lead to discrepancies in the data due to different methodologies used by these tools. It’s important to understand these factors and consider them when comparing your website statistics with your competitors’.

How can I use website statistics to improve my website’s performance?

Website statistics provide valuable insights into your website’s performance and user behavior. You can use these insights to identify areas of your website that need improvement. For example, if your website has a high bounce rate, it could indicate that users are not finding what they’re looking for on your website. In this case, you could improve your website’s content and navigation to better meet your users’ needs. Also, you can use website statistics to measure the effectiveness of your SEO and marketing efforts and make necessary adjustments.

What are some reliable tools for website statistics and reporting?

There are several reliable tools for website statistics and reporting. Google Analytics is one of the most popular tools, offering a wide range of features for tracking and analyzing website data. Other reliable tools include SEMrush, SimilarWeb, and Moz. These tools provide comprehensive data on website traffic, user behavior, SEO performance, and more. Choose a tool that best fits your needs and budget.

How can I protect my website statistics from spam and bot traffic?

Protecting your website statistics from spam and bot traffic involves several steps. First, use a reliable analytics tool that has features for filtering out spam and bot traffic. Regularly update your website’s security measures to prevent spam and bot attacks. Also, regularly audit your website data to identify any unusual activity that could indicate spam or bot traffic.

How can I interpret my website statistics?

Interpreting your website statistics involves understanding the different metrics and what they indicate about your website’s performance. For example, the number of visitors indicates the reach of your website, while the bounce rate indicates user engagement. Also, consider the context of your data. For example, a high bounce rate could be a bad sign if you’re an e-commerce website, but it could be normal if you’re a blog. Use your website statistics in conjunction with your business goals to interpret them accurately.

How often should I check my website statistics?

The frequency of checking your website statistics depends on your business needs and goals. If you’re actively running marketing campaigns, you might need to check your statistics daily to monitor the performance of your campaigns. However, for a general overview of your website’s performance, checking your statistics weekly or monthly should be sufficient. Regularly checking your website statistics helps you stay informed about your website’s performance and make necessary adjustments.