One of the most important aspects of Website management is traffic analysis. If you don’t know where your visitors are coming from — and in what numbers — you can’t effectively promote your site, or gauge the effects of any current promotion efforts.
Checking the stats for your site(s) should be a daily activity, and if you’re not doing it already, now’s the time to start!
There is some confusion as to the different terms used to describe Website traffic. Misuse of these terms often causes miscommunication, so it’s important that you know the correct words and concepts.. The most common terms you’ll find include:
A page view. An impression occurs when someone views one of your HTML pages. If you use frames, you should only count impressions on your main content pages, not those on the pages you use for your menu or header frames. Another way to look at this is to only count impressions on pages that display advertising.
A page view by a unique person within a 24 hour period. Uniques are usually measured by identifying the IP addresses of each visitor using your site. However some services, notably AOL, send all their members through proxy servers, so thousands or millions of people can share the same IP address. This usually means that if you record the number of uniques by reviewing impressions by unique IP addresses, your actual number will be slightly higher than what is reported in the logs. A better way to measure uniques would be a composite unique value composed of IP address, browser or user agent, and operating system.
A page that links to your site. This doesn’t have to be an actual page: it could, for instance, be the result set of a search engine. Looking at your referrers will tell you who’s linked to your site.
This refers to the software used to access your site. Sometimes known as a "browser" or "client", the term user agent can describe a PHP script, a browser like Internet Explorer, or a search engine spider like GoogleBot. If you can identify what software is being used to access your site, you’ll be able to tell if users are abusing it, and when the search engines last crawled your pages.
Counters and Trackers
Early in the life of the Web, counters were fairly popular. A counter is a simple script that records the number of visitors to a site in a text file or database and then displays the total, either textually or graphically, on the Website. You still find them on some amateur pages, but for the most part, their use has died out — primarily because site owners wanted more complex information about their traffic, but also because these counters have come to be seen as unprofessional.
Now most professional or commercial sites use tracking software. Tracking software tells you more than just the number of visitors — it can break visitor statistics down by date, time, browser, page viewed, referrer, and countless other values. Trackers are so named because they can more or less detail for you the path a visitor takes through your Website, so they do more than just count your traffic: they track it. You can choose from three main types of tracking software — let’s look at your options.
The Three Flavors of Tracking Software
1. Remote Tracking Services
So try to avoid using these services unless you don’t have the ability or expertise to execute tracking scripts of any kind on your own server.
2. Logging Programs
This is my preferred method of traffic analysis. Logging programs are scripts that you install on your server, which then generate both log files (either in flat files or a database), and reports. I prefer this type of program over a log analysis system (discussed below) because logging programs afford the site owner more control — you decide what is logged and what isn’t, and only track those pages you want to track.
The downside to doing this is that you must maintain your log files, and if your site is popular, they can grow rather large. On one of my sites (which logs over a million impressions a month) the log file grows by about 15mb a day so I usually rotate it every 3 days. Now, if you use a log analysis program you’ll still be battling large log files, however these are your server’s log files, and thus they are automatically rotated and maintained for you.
Another added feature of this type of program is that you can sometimes use them to track links from your site as well, so you can identify exactly how much traffic you send away in a link exchange.
3. Log Analysis Programs
These are programs that analyze your server logs and then create traffic reports accordingly. Some may include advanced filters, which allow you to specify what exactly you want reported, but most will simply report everything in the log files — usually covering total hits, impressions, and uniques. Of course, the quality of the reports generated will depend on what software you actually use.
Some log analyzers are free and come preinstalled on many hosting accounts, while others can cost a good deal of money.
What to do Every Day
Once you have your tracking software set up you can start using it, but what should you actually look for? There are a variety of things you should check every day as follows:
The first thing you should check daily is your referrers. I know from personal experience that if you have a popular site, your referrers can number in the thousands, so reading through that list every day can be a chore, but it’s a must!
When you’re looking at your referrers, look for two things:
- where in the search engines visitors find your listings, and
- on what other Web pages visitors have located links to your site.
Specifically this will help you check to see whether you’re maintaining your search engine position, and it will also help you identify new sites that link to you. When I find that a new site has linked to one of mine, I submit their site to Google so that it can spider their site, see the link, and increase my link popularity rating. Some people would advise against doing this on the grounds that it is unethical to submit another person’s site to a search engine, but I disagree.
In the past, some search engines would ban sites that were oversubmitted, however Google has never, and still does not do this. If a page has already been submitted, your request will simply be ignored. As nothing bad will ever come of submitting someone else’s page, I don’t see this practice as unethical, especially since many of the people who own these pages may not know how to submit their site. Of course, you should make your own decision on this issue.
IP Addresses and User Agents
The second thing you need to check is the IP addresses and user agents of your visitors. This information will tell you two things:
- When a search engine spiders your site.
- If someone is abusing your site.
The first point is important because, unless you know when your site was spidered, you cannot effectively troubleshoot your search engine listings (for instance, if they appear outdated, or fail to appear at all). Many people will remember when they submitted to the search engines, but if you ask them when they were spidered, they don’t have a clue. Knowing when a search engine spiders, and when they update, will allow you to predict when your listings will change.
The second point is important because there are a lot of people out there with little to do, and there are many ways they can abuse a Website. One way is to write a script that rips content off a Website to display on your own.
For instance, there are scripts that rip news headlines off sites like CNN.com. Then the site owner displays the headlines on their own site, along with a link back to CNN. While technically it is wrong to copy their headlines, it is easily forgiven by bigger players, as the site owners are using the headlines to link to them (effectively driving traffic back to their site).
However, it is just as easy to write a script that steals articles from a site and displays them on your own. If you are the victim of either of these malpractices, you can usually tell through your logs. There will usually be a large number of requests from their IP address (which should resolve to a Web server), as well as excessive hit counts from a user agent called "PHP," "Perl," or another scripting language. Sometimes people will download your entire site and then republish it on their server, however they sometimes forget to recode some links, resulting in hits from their version of your site to your original site. One SitePoint Forum advisor recently discovered this exact thing happening by close monitoring of his referrers.
On the topic of downloading an entire site, there are also site rippers out there. Often benignly named "offline browsers," much in the way some Trojans are named "remote administration tools," these are programs that can be used to download your entire site, which not only steals your site (design, content, etc.) but can crash, or severely slow down your server. Depending on the size of your site, these programs can be detected by looking at IP addresses — if you see hundreds or thousands of impressions from one address, chances are it’s one of these programs. You can also look for their user agents — some of the more popular ones are Wget, Teleport, HTTrack, and Web Reaper. I should mention that Wget is a valid program used on unix servers to download files, such as patches or drivers. However, unless you provide such downloads on your site, anyone using this agent on your site is probably stealing.
Yet another form of site abuse is to harvest emails off of a site — this is especially important if you run a community site, where users often post their email addresses. AS with site rippers, you can often identify email harvesters via their user agent.
The final method of site abuse is to block a site’s advertisements. Some consider this a right of the surfer, however, I feel that it is stealing. A Webmaster places advertisements expecting that users will view them in conjunction with the content they view for free. If visitors block the advertisements, then ethically I don’t think they should visit the site at all. Some Webmasters will redirect people using ad blocking programs to a page that asks them to pay for site access, and that approach reflects how many Webmasters feel: you either pay with your wallet, or with your eyeballs. Like the aforementioned examples, this can be detected by monitoring user agent.
Once you identify the IP addresses or user agents of those abusing your site you can ban them (using .htaccess if you run Apache), but a full explanation of this is obviously beyond the scope of this article.
Other Statistical Information
There is much information you can gather from your statistics in addition to that which has been mentioned so far. This information is usually useful when you attempt to sell advertising, or reassess your promotional efforts.
Your server stats can provide limited demographic information that’s helpful for both designing your site, and attracting advertisers. For instance, by researching the stats on operating systems or user agents, you can tell whether your visitors use a PC or a Mac, Internet Explorer or Netscape. Some software can also give you geographic statistics by resolving the IP address of your visitors. While these statistics are not the most accurate (it isn’t always possible to accurately identify a user’s country of origin), this information can still be valuable in the presentation of packages to potential advertisers, or even when you’re deciding whether to make regional changes to your site — add content in a second language, for example.
Search Engine Statistics
In addition to glancing over your referrers to ensure that you’re maintaining your search engine positions, you can occasionally do a more detailed analysis, to compare the amount of traffic you get from various search engines. This can help you identify whether there’s a particular engine that’s performing poorly for you. You can then identify which referrers you need to work on — to increase the amount of traffic they send you (though you should keep in mind that perceived ‘lower traffic levels’ could be the result of a search engine being less popular than the others you track).
You can occasionally analyze visitor behavior as well. For instance, a quick review of the stats may indicate the pages visitors use to enter and leave your site, which, in turn, can tell you which portions of your site are the most popular, and which sections need work.
If your site spans multiple topics, this analysis might also help you identify the topics that interest your users the most. For instance, if you review Mac and PC hardware and most of your visitors read the Mac reviews, then you might consider focusing more on the Mac section and phasing out the PC information (or developing it into a separate site). But this information’s not only handy for your reference — it’s also useful in dealings with potential advertisers. Furthermore, if you run a community-based site, these details can indicate how many lurkers you may have, and if you run an article-based site, the stats can indicate which articles or authors are the most popular.
Traffic Patterns Over Time
Other good statistics to keep an eye on are those that measure traffic patterns over time. These can indicate not just the times at which your site receives the most traffic, but can also provide real insights into you audience — a clear picture of your visitors’ usage over time can suggest the reasons why they visit your site.
For instance, I noticed that traffic to my educational site is closely tied to the school year, so on weekends, holidays, and in the summer, my traffic levels drop. This information suggested that my key users were students, which has allowed me to target my advertising accordingly. Another key benefit of knowing when your site receives the most traffic is that you can then schedule downtime (for upgrades and maintenance) around the hours when usage is at its lowest.
This article was intended as a primer for Web traffic analysis, but for more information on some of the topics that were mentioned here, visit the links below:
List of search engine user agents:
How to ban bad people/robots: