Conditional Logging in Apache

Often we find entries in our web logs which we seek to supress or block based on the fact that these entries may be skewing true statistics.

Most web log analysis software offers exclusion filters to block numerous types of entries. However, this can also be done natively in Apache.

For example, perhaps we would like to exclude our own IP address as well as requests for the favicon.ico from the logs.

(You will need to modify the IP address to a real one for this to work – i.e. either your machine IP address if using static IP or that of your proxy server/router if proxying Internet access from your local network.)

# Prevent entries from my host address
SetEnvIf Remote_Addr "10.0.0.1" dontlog
# Prevent entries for the favicon.ico file
SetEnvIf Request_URI "^/favicon.ico$" dontlog
# Log what remains
CustomLog logs/web.log combined env=!dontlog

Additionally, you could additionally prevent requests for the robots.txt file from being logged as well.

# Prevent entries for robots.txt
SetEnvIf Request_URI "^/robots.txt$" dontlog

NOTE: Remember to change the log type to that which you prefer, i.e. I use the combined log format instead of common. See your httpd.conf file for your current log format type.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.ajohnstone.com Andrew-J2000

    Whilst on the topic of apache, this may come in handy for diagnostics.

    With the Module mod_status enabled.


    LoadModule status_module modules/mod_status.so

    And preferably extended_status on ()


    ExtendedStatus On

    Place this inside a .htaccess file and locate your browser to http://example.com/info (Replace the domain with yours ;), after uploading the file )
    .htaccess



    order allow,deny
    allow from all

    #...

    SetHandler server-status
    #SetHandler server-info

    There are a few additional arguments you can pass to the server with a query…
    http://example.com/info?notable

    http://example.com/info?refresh

    http://example.com/info?refresh=

    http://example.com/info?auto

    Anyway the function of these seem apparent

  • craig34

    Do we make these changes in the httpd.conf file?

  • craig34

    Sorry for the dual post, but a few more questions occured to me after I posted.

    1. Does it matter where in the file that these commands are placed?
    2. Is there a way to prevent logging in the access log, but keep the logging in the error file? I pretty much require the error log to log when I access the site, for testing purposes.

  • http://www.practicalapplications.net bwarrene

    Yes – the changes go in the httpd.conf file. I usually make the entries after the logging section of the file where the log types are defined.

    You can prevent an access log by simply commenting out the directive to establish an access log. The access and error logs are separate entries – so your scenario is not a problem. I would recommend an access log though – as you at the very least have an audit trail of traffic even if not used for statistical analysis.

  • craig34

    Well… I think I wasn’t really clear in my post above. What I would like to do is use the conditional logging that this blog entry mentioned to prevent logging anything that comes from my company IP in my access log. However, I would want my error log to log anything coming from my company IP for my own error checking uses. So I would still want an access log, I just don’t want it to log access from my IP.

  • http://www.practicalapplications.net bwarrene

    As you are conditional logging on the access log – your company ip will be blocked – however the error log will still log errors related to the site. Note that you are calling an environment variable on the log definition line – so the impact of the condition is limited to the access log.

    Should work for you.l

    Thanks

  • craig34

    Sorry to bother you once again, but I’ve run into one more small problem. I don’t use the CustomLog directive, but instead I use LogFormat, then TransferLog. I tried putting the ” env=!dontlog” at the end of the TransferLog line, but it wasn’t good. :) Where should I put that snippet?

  • Doug

    Can this also be done through .htaccess?

    If so, any difference in the code that you noted in the above post?

  • http://www.practicalapplications.net bwarrene

    Yes you can use these in htaccess

  • http://www.practicalapplications.net bwarrene

    [QUOTE=craig34]Sorry to bother you once again, but I’ve run into one more small problem. I don’t use the CustomLog directive, but instead I use LogFormat, then TransferLog. I tried putting the ” env=!dontlog” at the end of the TransferLog line, but it wasn’t good. :) Where should I put that snippet?[/QUOTE]
    Can you post an example (without disclosing any sensitive info on paths on your server) of the code your using…?

  • sanrou

    craig34: Check to see if you have another CustomLog line, I had a line says:
    CustomLog logs/access_log combined env=!VLOG
    under Log Configuration section in my apache2.conf, after I replaced that line with:
    CustomLog logs/access_log combined env=!dontlog
    everything I defined in the SetEnvIf section worked out!

    Now I have a quesiton for Blane Warrene:
    What is the consequence of deleting the word !VLOG? Is there a better workaround?

    Thanks,

  • http://www.practicalapplications.net bwarrene

    Sanrou – it only affects your logging if you were assigning something of importance to that env. I.e. if you need to log something related to the use of !VLOG – you may want to fit it in – otherwise – you should not have any issues.

  • Jeff in Texas

    Lately my logs have been spammed by ‘openfos.com’ The entries will look something like ‘openfos.com/supply/ with the name of your site or name at the end of the string. There would be dozens of hits a day with the same referal url with multiple user agents. They usually use the following IP’s to spam your logs:

    218.153.70.244

    221.148.31.116

    I don’t mind a hit or two a day, but when you have sometimes 30 hits a day! (The IP’s are also blocked in .htaccess )

    The site ‘openfos.com is registered to a Korean company and e-mail to the IP’s network abuse go unanswered.

    I found Blane’s conditional logging to be the trick to get the openfos spam out of my logs. It works!

    Thanks!
    Jeff

  • Dave

    I’m going to do something with conditional logging with two companies. I have blocked them with .htaccess but the logs are full of hits from these people and abuse does not respond to problems. One company is called jaja-jak-globusy.com that just pounds the logs and I don’t know why the company in Houston, Texas allows this. I used the .htaccess line:

    SetEnvIfNoCase Referer “globusy” bad_ref

    It makes them go 403, but then they fill up the logs.

    The other company is hits using a regular Mozilla browser, but it hits robots.txt then scrapes the site and it comes from insightbb.com Perhaps Insight Broadband has no insight into the abuse coming from their IP’s? All of Insightbb are blocked and they go 403.

    Here is my question, can you write something for IP ranges that will prevent logging from a company like insightbb when the go 403. I guess what I’m asking, how would you do it for ranges of IP’s? Would it be the same as in .htaccess?

    Thanks in advance!

  • Mike

    I have used .htaccess to block ia_archiver run by Alexa Internet and Twiceler a bot run by someone from Stanford University. At first they would hit once a week or month but now it is a case of robots gone mad. I am especially concerned about Twiceler since they have been linked to being a U.S. Government bot that does not respect robot.txt For months their site at cuill.com has said that their search engine is “coming soon” but month after month no search engine. It looks like a storefront to me. In any case, both ia_archiver and Twiceler have gone nuts hitting multiple times a day and filling up my logs with junk. After using the conditional logging I was able to get rid of moth of these pests. Nice clean logs. I would highly recommend conditional logging if the amount of hits still bug you after sending them 403. ;-)

  • mgwalk

    A few lines up in my conf file I had my log set to common, which was blocking my ip blocking. I marked it out and BAM!!! It works now!

    Thanks!!!