Often we find entries in our web logs which we seek to supress or block based on the fact that these entries may be skewing true statistics.
Most web log analysis software offers exclusion filters to block numerous types of entries. However, this can also be done natively in Apache.
For example, perhaps we would like to exclude our own IP address as well as requests for the favicon.ico from the logs.
(You will need to modify the IP address to a real one for this to work - i.e. either your machine IP address if using static IP or that of your proxy server/router if proxying Internet access from your local network.)
# Prevent entries from my host address
SetEnvIf Remote_Addr "10\.0\.0\.1" dontlog
# Prevent entries for the favicon.ico file
SetEnvIf Request_URI "^/favicon\.ico$" dontlog
# Log what remains
CustomLog logs/web.log combined env=!dontlog
Additionally, you could additionally prevent requests for the robots.txt file from being logged as well.
# Prevent entries for robots.txt
SetEnvIf Request_URI "^/robots\.txt$" dontlog
NOTE: Remember to change the log type to that which you prefer, i.e. I use the combined log format instead of common. See your httpd.conf file for your current log format type.






February 18th, 2004 at 5:56 pm
Whilst on the topic of apache, this may come in handy for diagnostics.
With the Module mod_status enabled.
LoadModule status_module modules/mod_status.soAnd preferably extended_status on ()
ExtendedStatus OnPlace this inside a .htaccess file and locate your browser to http://example.com/info (Replace the domain with yours ;), after uploading the file )
.htaccess
There are a few additional arguments you can pass to the server with a query…
http://example.com/info?notable
http://example.com/info?refresh
http://example.com/info?refresh=
http://example.com/info?auto
Anyway the function of these seem apparent
February 20th, 2004 at 10:35 am
Do we make these changes in the httpd.conf file?
February 20th, 2004 at 10:39 am
Sorry for the dual post, but a few more questions occured to me after I posted.
1. Does it matter where in the file that these commands are placed?
2. Is there a way to prevent logging in the access log, but keep the logging in the error file? I pretty much require the error log to log when I access the site, for testing purposes.
February 20th, 2004 at 1:18 pm
Yes - the changes go in the httpd.conf file. I usually make the entries after the logging section of the file where the log types are defined.
You can prevent an access log by simply commenting out the directive to establish an access log. The access and error logs are separate entries - so your scenario is not a problem. I would recommend an access log though - as you at the very least have an audit trail of traffic even if not used for statistical analysis.
February 24th, 2004 at 2:37 pm
Well… I think I wasn’t really clear in my post above. What I would like to do is use the conditional logging that this blog entry mentioned to prevent logging anything that comes from my company IP in my access log. However, I would want my error log to log anything coming from my company IP for my own error checking uses. So I would still want an access log, I just don’t want it to log access from my IP.
February 25th, 2004 at 9:20 am
As you are conditional logging on the access log - your company ip will be blocked - however the error log will still log errors related to the site. Note that you are calling an environment variable on the log definition line - so the impact of the condition is limited to the access log.
Should work for you.l
Thanks
March 3rd, 2004 at 4:41 pm
Sorry to bother you once again, but I’ve run into one more small problem. I don’t use the CustomLog directive, but instead I use LogFormat, then TransferLog. I tried putting the ” env=!dontlog” at the end of the TransferLog line, but it wasn’t good. :) Where should I put that snippet?
March 18th, 2004 at 10:11 am
Can this also be done through .htaccess?
If so, any difference in the code that you noted in the above post?
March 19th, 2004 at 11:42 am
Yes you can use these in htaccess
March 19th, 2004 at 11:43 am
[QUOTE=craig34]Sorry to bother you once again, but I’ve run into one more small problem. I don’t use the CustomLog directive, but instead I use LogFormat, then TransferLog. I tried putting the ” env=!dontlog” at the end of the TransferLog line, but it wasn’t good. :) Where should I put that snippet?[/QUOTE]
Can you post an example (without disclosing any sensitive info on paths on your server) of the code your using…?
August 14th, 2004 at 3:26 pm
craig34: Check to see if you have another CustomLog line, I had a line says:
CustomLog logs/access_log combined env=!VLOG
under Log Configuration section in my apache2.conf, after I replaced that line with:
CustomLog logs/access_log combined env=!dontlog
everything I defined in the SetEnvIf section worked out!
Now I have a quesiton for Blane Warrene:
What is the consequence of deleting the word !VLOG? Is there a better workaround?
Thanks,
August 16th, 2004 at 10:25 am
Sanrou - it only affects your logging if you were assigning something of importance to that env. I.e. if you need to log something related to the use of !VLOG - you may want to fit it in - otherwise - you should not have any issues.
July 11th, 2006 at 3:25 am
Lately my logs have been spammed by ‘openfos.com’ The entries will look something like ‘openfos.com/supply/ with the name of your site or name at the end of the string. There would be dozens of hits a day with the same referal url with multiple user agents. They usually use the following IP’s to spam your logs:
218.153.70.244
221.148.31.116
I don’t mind a hit or two a day, but when you have sometimes 30 hits a day! (The IP’s are also blocked in .htaccess )
The site ‘openfos.com is registered to a Korean company and e-mail to the IP’s network abuse go unanswered.
I found Blane’s conditional logging to be the trick to get the openfos spam out of my logs. It works!
Thanks!
Jeff
December 27th, 2006 at 5:31 am
I’m going to do something with conditional logging with two companies. I have blocked them with .htaccess but the logs are full of hits from these people and abuse does not respond to problems. One company is called jaja-jak-globusy.com that just pounds the logs and I don’t know why the company in Houston, Texas allows this. I used the .htaccess line:
SetEnvIfNoCase Referer “globusy” bad_ref
It makes them go 403, but then they fill up the logs.
The other company is hits using a regular Mozilla browser, but it hits robots.txt then scrapes the site and it comes from insightbb.com Perhaps Insight Broadband has no insight into the abuse coming from their IP’s? All of Insightbb are blocked and they go 403.
Here is my question, can you write something for IP ranges that will prevent logging from a company like insightbb when the go 403. I guess what I’m asking, how would you do it for ranges of IP’s? Would it be the same as in .htaccess?
Thanks in advance!
September 13th, 2007 at 4:03 am
I have used .htaccess to block ia_archiver run by Alexa Internet and Twiceler a bot run by someone from Stanford University. At first they would hit once a week or month but now it is a case of robots gone mad. I am especially concerned about Twiceler since they have been linked to being a U.S. Government bot that does not respect robot.txt For months their site at cuill.com has said that their search engine is “coming soon” but month after month no search engine. It looks like a storefront to me. In any case, both ia_archiver and Twiceler have gone nuts hitting multiple times a day and filling up my logs with junk. After using the conditional logging I was able to get rid of moth of these pests. Nice clean logs. I would highly recommend conditional logging if the amount of hits still bug you after sending them 403. ;-)
March 10th, 2008 at 4:08 am
A few lines up in my conf file I had my log set to common, which was blocking my ip blocking. I marked it out and BAM!!! It works now!
Thanks!!!