Why Your Website Statistics Reports Are Wrong, Part 2

web site statistics pie chartIn my first post, we examined the advantages and disadvantages of server-side data collection and analysis. Although the generated reports can be useful, they have their flaws. Today, we look at the alternative…

Client-side Data Collection and Analysis

Client-side data collection requires an image or JavaScript code to be inserted on every page you want to analyze. The popularity of image counters has waned because they can only provide basic page hit information. JavaScript code can achieve more and is used by Google Analytics, one the most popular statistical reporting systems.

JavaScript runs whenever the page loads in your browser so it can overcome many of the caching issues experienced by server-side data collection. The code can also collect more detailed client information, e.g. the time spent on a page, mouse activity, clicked links, the screen resolution, color depth, browser window size, installed plugins, etc. Cookies may also be used to identify unique users and reveal navigation paths. Ultimately, the JavaScript code sends data to a back-end server for processing.

Unfortunately, there are several drawbacks:

Blocking
JavaScript and cookies can be disabled or blocked. In general, you can expect around 5% to not run JavaScript, but this will differ from site to site. Similarly, search engine bots do not run JavaScript so the reports can never tell you if or when your site was indexed. The report may make an allowance for the missing users but it’s only a guess.

Dodgy code
JavaScript is fragile: if another script on your page causes a fatal error it could prevent data collation. Perhaps worse is that you could have a script which fails in just one browser; if your report indicates there are no Internet Explorer users is it because they were never recorded?

Page problems
JavaScript can analyze almost any aspect of the user’s page interaction, but the code could cause other scripts to break or run slowly. Most systems therefore take a conservative approach and only record basic details shortly after the page has loaded.

HTML pages only
It is only possible to analyze web pages which return HTML — the systems can not record CSS, images, MP3 or PDF file access. Google Analytics allows you to add on-click handlers to file download links, but there is no guarantee users will click them (they could receive the file URL via an email).

Human error
Script code must be manually added to every page you want to monitor. If you accidentally omit a page, it will never be recorded in your reports.

No historical data
Statistics can only be recorded from the point the script is added: it is not possible to generate historical reports.

Security risks
Linking to third-party JavaScript code is risky. It’s certainly inadvisable to use such systems on secure pages collecting private data.

Although many client-side analyzers produce great-looking and informative reports, the data collection process is inherently more volatile than server-side methods.

In the final post, we look at global statistics systems and summarize when, where, and how statistics can be useful.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • http://www.tyssendesign.com.au Tyssen

    I think the headline of this series is a bit sensationalist and inaccurate. The reports aren’t wrong, just not 100% accurate – there’s a big difference. There’s still plenty of useful information that can be taken from these sorts of reports, even if the numbers aren’t 100% right.

  • aemciv

    / Tyssen is correct. Nice write up though. One of the best firefox add ons I have is google analytics indicator. The icon will light up if GA is installed on the page. Quick way to go through your site to make sure it is inserted on every page.

    Matt

  • http://www.optimalworks.net/ Craig Buckler

    @Tyssen
    If they’re not 100% accurate then surely they’re wrong?! Also, how accurate are they? 99%? 95%? It’s impossible to quantify the degree of error.

    Semantics aside, the problem is that people can blindly use these figures without questioning the data collection processes. Most people know surveys and statistics can be swayed, but few people are aware that web usage reports are not as irrefutable as they appear.

  • http://www.ciudadesmayas.com elviajero

    Agree with Tyssen.

    The note can be improved. There are many things causing the reports to be inaccurate (additional to the ones already mentioned), like bots and general crawlers, but the note itself (head) promises more than the content has.

  • Nick

    I agree with Tyssen. If the criterion for whether something is wrong is whether it’s 100% (or even 98%) accurate or not then just about everything we quantify in life is wrong and the very word becomes meaningless.

    If someone asks me how far the next village is and I say three miles then I know that’s not likely to be very accurate. The sign post says three miles and it’s certainly less than four (unless you’re going to the far end) and more than two.

    The truth is I can’t sure it’s within 20% (plus or minus!) of three miles but that isn’t going to matter and if the guy phoned me when he got there to tell me I was wrong because I was 15% out I’d just think he was a bit odd.

    London is 420 miles away. I’m pretty sure I’m right within maybe 5% or so either way. At least, it’s a big enough blob that there’s a good chance that if I drive 420 miles along the most appropriate motorways and A roads and without any detour I’ll be pretty much in the vicinity one way or another.

    Sure I’m almost certainly technically wrong when I say 420 miles (to about the centre, say) but …

    I’ve had enough.

    I want to know if I’ve about 1, 2, 5, 10, 20, 40, 80, 150, 300, 600, 1200, 2500, 5000, 10,000, 20,000, etc human visits a day, what the broad trend looks like and whether something unusual happens.

  • http://www.dangrossman.info Dan Grossman

    Web stats don’t have to be accurate, they just have to give you enough to see trends and to make comparisons.

    Even if the web stats are only hitting 50% of your visitors, assuming it’s not completely biased due to some installation/script error, that’s enough to see day to day changes. That’s enough to identify which pages have the highest bounce rate and are in need of improvement. That’s enough to see where users are entering the site to know what landing pages might need improving or what content has become recently popular and could use expanding.

    Web stats are not about counting numbers, not at all.

  • http://www.lunadesign.org awasson

    Strange article… I didn’t find any value in it.

    Out of the list of cons the only real issue is ‘script blocking’ and even then it’s doubtful that it will be an issue. All of the other cons are more about crappy developers who don’t check their work.

  • http://www.optimalworks.net/ Craig Buckler

    @elviajero
    Bots are covered in part 1.

    @everyone-else-who-hates-the-heading
    Would you have preferred the title “Why your website statistics reports are inaccurate but the degree of inaccuracy may be negligible depending on the circumstances and your perspective on the matter”?!!

    The fact remains that you can’t even assess the margin of error. Low traffic volumes are meaningless because a couple of non-JavaScript users can completely sway the results. At higher volumes, your figures could actually drop as traffic increases (again, see part 1).

    Website reports can be dangerous in the wrong hands, especially when people think they’re irrefutable or even broadly correct. Part 3 is coming later today — it contains some information you may find useful.

  • http://www.optimalworks.net/ Craig Buckler

    @awasson

    the only real issue is ‘script blocking’

    That’s the big one, but what about it being for HTML pages only and not being able to access historical data?

    As for developer mistakes, perhaps you’ve never made any, but I certainly have! Many sites also have a reliance on third-party JS libraries such as jQuery, mootools, YUI etc. For example, jQuery throws fatal errors in IE5/5.5 which could abort your statistics collation. Developers could blindly assume there’s no IE5.x users from their statistics.

  • http://www.lunadesign.org awasson

    Developers could blindly assume there’s no IE5.x users from their statistics.

    Yes… Except they would be right. IE 5 is next to dead. I checked the AWStats on a moderately busy site and it indicates that they received 1 or less visits from IE5/IE5.5 combined for the months of Oct – Dec. I also perused the stats from a number of other sites in my care and the story was the same. Either zero visits from IE5x or a fraction of a percent of the visitors who come to the sites.

    In the case of IE5, it’s just noise… Analytics shows it as IE999.1 but it is noise and I don’t want to sound glib but you have to draw a line about what’s important and what’s not. When IE6 makes up less than a percent, I’ll gladly ignore it too.

    I read an article about 15 years ago that talked about information overload and how meaningful information was going to get buried under a glut of meaningless statistics. I think it was discussing this type of thing. Worrying about a piece of information about a browser that makes up for less than one percent of the visitors of your site and by extension an even smaller part of your traffic.

  • http://www.tyssendesign.com.au Tyssen

    @everyone-else-who-hates-the-heading
    Would you have preferred the title “Why your website statistics reports are inaccurate but the degree of inaccuracy may be negligible depending on the circumstances and your perspective on the matter”?!!

    Seems like you’re getting a bit uptight cos none of the comments on this post agree with your point of view. :/

  • http://www.optimalworks.net/ Craig Buckler

    @awasson
    IE5.x was an example (to highlight that it’s not always “crappy developers” at fault), but you’ve illustrated the point of this article precisely. You had to look at AWstats to discover whether anyone is using that browser. You couldn’t rely on Google Analytics or similar client-side data collection systems to obtain the figure (see part 3).

    @Tyssen
    I don’t think anyone’s disagreed with the subject matter, have they? Do you have any alternative suggestions for the title?

  • http://www.lunadesign.org awasson

    @Craig
    I think this illustrates another problem with statistics and a lot of information in general. With the glut of easy to access statistics about everything regarding anything, we run the risk of loosing sight of what’s important.

    IE5x and less shows up on Google Analytics but it shows up as IE999.1. On reviewing AWStats, it shows up as proportionally less than one percent of all the hits on the sites I looked at. Does it matter or is it yet another bit of fluff that is inconsequential to the running of your business yet takes up valuable space.

    Perhaps the reason IE5x and less aren’t supported by Google Analytics is because someone at Google Labs decided that it didn’t matter and if it did, I’m sure it would be supported.

  • VodkaFish

    the only real issue is ‘script blocking’

    Is it? If someone’s blocking scripts, etc., any advertising-based site has no value in that user. Most ecommerce sites can do ok with no scripts, but it’s not easy.

    If you have zero business interest with your website and you just want to accumulate readers, ok, then that visitor has a value, otherwise, it’s not a number anyone’s missing.

  • http://www.optimalworks.net/ Craig Buckler

    @awasson
    I understand that IE5.x is of negligible importance to most people. But my point isn’t really about IE5 or any other browser – someone could easily misread their statistics because of a library they’re using or a decision by an Analytics developer.

    You were only able to find out that IE5 was unimportant on your site because you looked at AWstats. Google Analytics could not help you. But how many people would realise that?

  • http://www.lunadesign.org awasson

    @Craig
    Maybe you’re misunderstanding what I wrote… I can get IE5x and less stats in Analytics but they are under the heading IE999.1 They are marked wrong but they are still there. My main point is that as long as the important statistics are there, why waste time & resources on the junk stats that have no meaningful purpose.

    Or more to the point… Other than IE5 what’s missing on the client side for statistics gathering?

  • http://www.optimalworks.net/ Craig Buckler

    @awasson
    Statistics which are unimportant to you could be vital to others. For example, some Government agencies, large organisations and African/Asian countries still have a high number of IE5.x users.

    Of course it’s not just IE5 – any new or rare browser may not be identified. But let’s move on from browser detection.

    What if your client asks how many people have JavaScript disabled? Google Analytics won’t tell you and even server-side parsers can only make an educated guess. A novice developer could easily “prove” that everyone has JavaScript enabled and dismiss concepts such as accessibility.

    The problem for all client-side analysers is that a result of zero does not necessarily mean zero.

  • http://www.lunadesign.org awasson

    @Craig

    Statistics which are unimportant to you could be vital to others. For example, some Government agencies, large organisations and African/Asian countries still have a high number of IE5.x users.

    There is a logic disconnect to this argument. If I were looking at the statistics of a less technologically developed audience, I would see a greater number of IE5 users and the statistics would then be relevant. I would see a large number of IE999.1 users which would indicate that further analysis was required. At that point I would likely use AWStats or another server side log reader but until that time, it’s not necessary.

    That said, it would be interesting as an intellectual curiosity to see how many real people use IE5. It seems to have dropped literally off the statistics charts of browser use worldwide half way through 2008 according to anything I can find by searching. It was showing 1% – 2% in 2006 and then it just disappeared. It’s not a valuable statistic at any stretch but interesting none the less.
    The Javascript argument is another argument so finite that it really doesn’t matter to 99.99% of cases and if it does you’ll use a different tool. Javascript, I’m not so concerned about because I don’t typically incorporate Javascript into my sites in such a way that the lack of it will break it. It’s not used for navigation and I try not to use it for eye candy unless the client demands it (and sometimes they do).

    The most important stats to my clientele are:
    Number of visitors
    Top content
    Pay-per-click landing page visits
    Traffic sources
    Time on site

    The most important stats for me are:
    Screen resolution
    Browsers
    Connection speed
    Traffic Sources

    Although your article doesn’t resonate with me perhaps in the way it was intended to, it has proven to be thought provoking and prompted me to have a closer look at my stats so that’s something.

  • tactics

    Where did you get the figure that 5% of visitors have Javascript disabled? I’ve seen this number thrown around a lot, but never seen anything that backs it up. I worked as a developer for a large internet retailer, and when we weeded out all of the toolbars, site rippers, bots with spoofed user agents, etc. we came up with a figure of 0.72%, and most of those were Blackberrys.

  • http://www.scriptsdesk.com jakab

    Provided the very useful info.Very informative one.Really nice indeed.

  • http://www.optimalworks.net/ Craig Buckler

    The 5% disabled JavaScript figure is one that’s been thrown around since the dawn of the web (actually, it used to be 10%, but it’s still just a guess). The fact remains that it’s difficult to calculate a figure and it will differ from site to site.

    Some browsers don’t offer JS (older mobiles, screen readers, Lynx etc). Many companies block JS for security reasons. Also, disabling JS isn’t the whole story: advert blockers and plugins such as Greasemonkey can do the same thing (intentionally or unintentionally).

    The problem for Google Analytics and similar systems is that those users are completely missing from the statistics. The margin of error may be tiny. But it may be huge … and there’s absolutely no way of knowing!