Blackhole trap for bad bots

[Off Topic]

$ipaddress = $_SERVER['REMOTE_ADDR'];.

I had a quick look at the Blackhole site, downloaded the link and noticed the above script is used. This is OK if the IP_ADDRESS does not change. Unfortunately my Internet Service Provider does not guarantee a static IP_ADDRESS and it changes frequently. The result would possibly mean I could be banned today, OK, tomorrow but someone else who is allocated the banned IP_ADDRESS would not have access.

@John_Betong:

You (or any human) are unlikely to be caught in a Blackhole trap. The link is hidden, so you wouldn’t click it by accident, and there are clear warnings given: <a href="http://www.domain.co.uk/blackhole/" title="Do not follow this link or you will be banned from the site." rel="nofollow"><img src="images/link.gif" alt="Do not follow this link" /></a> If you do activate the link inadvertently, it takes you to a page which tells you you have been banned, and provides an e-mail address for you to contact the site owner. My traps have caught around 10,000 bots, and we’ve only ever had one person tell us they’ve been banned from the site and don’t know why. (That’s not counting the time I ran a link check on one of the sites and accidentally banned myself. )

Bots generally come from bot farms with fixed IP addresses, so it’s unlikely they will affect genuine users. I also clear the blocked IP addresses from time to time, in case they have changed ownership.

1 Like

[ot]Many thanks for the detailed BlackHole reply. My confidence has been restored and I will give it a go.
[/ot]

1 Like

That Blackhole looks interesting. Could it be used to ban other kinds of nasties on the web? I’m thinking of attempted hackers and those that repeatedly fail captcha.
I have already created some code for these cases, after spotting some tampering of url variables in Analytics. But these are not too strict, they only ban for the session at present.
Could I simply alter my script to redirect offenders to the blackhole script?

I imagine that would work. You’d need to be careful that legitimate users couldn’t get accidentally caught, though. (Although, as I say, there is the “failsafe” message which provides an e-mail address for anybody who does get caught.)

If I understand it correctly, the ban is activated by a user visiting blackhole/index.php so a redirect should work.

Thinking about it, I probably wouldn’t use it on the captcha failure, as it can be prone to natural human error. I currently have a 3 strikes and you’re out system, but the ban is just for the session, so a visitor can come back and try again later. If a bot fails captcha 3 times, the fact it has failed is a victory on my part, so job done!
As for the attempted hackers, I feel less lenient. Although they have yet to cause any harm I don’t like that they are persistently trying and would like to stop them with a permanent ban. Normal users should not be tampering with the url variables.

1 Like

Isn’t this technique the same as honeypot?

I downloaded, installed and found a bug with blackhole.php (which should be included at the start of every page.) :smile:

The die(); command on the last line should be inside the while-loop otherwise the main page will not render.

<?php 
/*
Title: Blackhole for Bad Bots
Description: Automatically trap and block bots that don't obey robots.txt rules
Project URL: http://perishablepress.com/blackhole-bad-bots/
Author: Jeff Starr (aka Perishable)
Version: 2.0 - License: GPLv2 or later
*/
$badbot = 0;
$filename = 'blackhole.dat';
$ipaddress = $_SERVER['REMOTE_ADDR'];

$fp = fopen($filename, 'r') or die('<p>Error opening file.</p>');
while ($line = fgets($fp)) {
	if (!preg_match("/(googlebot|slurp|msnbot|teoma|yandex)/i", $line)) {
		$u = explode(' ', $line);
		if ($u[0] == $ipaddress) ++$badbot;
	}
}
fclose($fp);

if ($badbot > 0) {
  echo '<h1>You have been banned from this domain</h1>';
  echo '<p>If you think there has been a mistake, <a href="/contact/">contact the administrator</a> via proxy server.</p>';
}

die();  

Unfortunately Blackhole does not work with my cache system because all PHP files are cached and retrieved as HTML files; which is the only way I could achieve a 100% score using http://tools.pingdon.com/fpt

I’ve been trying since yesterday to get the current version of the trap working on a new site, but without success (even after fixing the above-mentioned bug). So I gave up and tried the earlier version, which is what I have on my other sites, and it worked first time.

I’ll attach that version here, in case it’s of use to anybody else.

Blackhole-v01.2.zip (5.0 KB)

I got the Blackhole (new version) installed and working yesterday, no problems. I changed a few things to be more secure, moving the blackhole.php include and dat file to “safe” places. I also renamed the directory to something more enticing to bad bots. I don’t know how smart they are, but if I was creating one, I would probably tell it to avoid anything called blackhole, honeypot, bot-trap or suchlike.
I’m thinking of making further modifications, like mentioned above, to trap other baddies. It would work as is by just forwarding them to the blackhole directory, but I may want to record how and why they got there. As it is, it assumes they are all naughty crawlers, but I want to trap hackers and spammers too.

That’s the bit I’m unsure about. How do you handle that bit? I could simply have <a href="mailto:myaddress@mydomain.com">Email Me</a> But I’m not a fan of revealing Email addresses on-line, particularly where I expect 99% of visitors to be the dregs of the web.
I suppose it would require a secure contact form, if I’m referring spammers to it.

[quote=“SamA74, post:11, topic:206292”]
But I’m not a fan of revealing Email addresses on-line, particularly where I expect 99% of visitors to be the dregs of the web.
[/quote]IIRC, the new version of Blackhole suggests linking to your contact page there, whereas the old one just had a mailto link.

I set up an e-mail address specifically for the trap, encoded it out of force of habit (it used to help; not sure if it still does) and then set that e-mail to redirect to another. In all the years I’ve had the traps operational, I have never had any Spam, or any mail at all beyond the notification e-mails of trapped bots and the one single person who somehow found themselves blocked.

That works well for bots; if you’re trying to trap hackers and Spammers too, simply directing them to the contact page (already in the public domain) might be better.

That’s another thing that confused me. I see in the template script <a href="/contact/">contact the administrator</a> which suggests linking to your Contact page, which I did. But, the ban has already been imposed by the script adding them to the blacklist. If the contact page has the blackhole include, they can’t view it.
I notice in the initial message it says contact the administrator but in the second visit message and include message it says contact the administrator via proxy server
This suggests that the first time around, they should be enabled to visit the contact page, though no means to do so has been added to the script, unless you omit the include from your contact page.
This could just be something the creator failed to consider when altering it from the Email method.
I could maybe add some code to enable access to the form for first time trapees. Or am I missing something?

Good point. I’d missed that one, and only noticed the “contact the administrator via proxy server” instructions. Although I wasn’t paying much attention, because when I tried the new script, I just replaced the “contact” links with mailtos, as for my existing sites.

You could try contacting the script author. It does say

Comments are closed. Contact the author with questions or further information.

So it sounds as if he’s amenable to support questions.

I’m working on a way around it. But torn between granting temporary access to the usual contact form on the first catch, or making a special form, just for the blackhole.

If I wanted to use the code you have for the blackhole trap on my site, what information is changed in that code?

The home page for the trap gives guidelines on how to customise.

Further customization: The previous five steps will get the Blackhole working, but there are some details that you’ll want to customize:

index.php (lines #27/28): Edit to/from email addresses with your own
index.php (lines #149/161): Check/replace path to your contact form
index.php (line #113): Line to whitelist user-agents (optional)
blackhole.php (line #30): Line to whitelist user-agents (optional)
blackhole.php (line #39): Check/replace path to your contact form

These are the recommended changes, but the PHP is clean and generates valid HTML5, so feel free to modify the source code as needed. Note that beyond these three items, no other edits need made.

The earlier version of the code is very similar; all you need to change is the contact details (but the line numbers may be different).

And note the error that John mentioned.
The die(); should be before the closing bracket }

To update on my progress, I got mine working with the contact form last night. They get one shot at it, but can’t access any other pages. The form knows if they came from the blackhole, so is extra careful what it does with the data.
Today I have altered it to trap those pesky folks tampering with URL variables. Within minutes of putting the new system live after some tests, I caught one.

3 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.