How to prevent member logins on my website via any software bot?

Hi,

I have built a simple website for users to ‘register’ and then post their jokes/quotes and also rate jokes/quotes of others members.

Since last 2 days, I have noticed a huge amount of fake ratings in the database. It has come to my notice that a hacker has built a bot (software) that allow users to ‘login’ and then they can click a button to submit automatic ratings for every single joke/quote record in the system. This way users can add fake ratings for 1000+ records in just a few seconds.

The site is coded in PHP but I have now taken it down temporarily.

Now how can I avoid login on my website using a software bot on user’s PC? I think that hacker is using session/cookie hijacking to create logins (I am not sure!). I have also used captcha for login but this hacker has now integrated the captcha in his bot.

So what should I do now? How can I make sure that a user can only login using my website and not through any bot?

Please help.

[FONT=verdana]Can identify the intruder’s IP address? You should be able to see this from your log files. If your hosting company supports a stats program like AWStats, this too should show the IP address.

You can then write PHP code to disallow the IP address at log-in time. Alternatively, you can place a “deny from” entry in your htaccess file, which would block all access to your server from the address in question. (If you’re not sure how to do that, just ask.)

Mike
[/FONT]

Dear Mikl, Thanks for your reply. Actually this hacker is promoting this bot on Fiverr and forums and I have noticed 8-9 different members using this bot to do fake ratings. It will not be possible for me to track and ban different IP addresses. I need a solution that can prevent logins on my website via this bot.

You should apply a [google]Turing Test[/google] to make sure the user is not a bot.

Ask a simple maths question or then again just use [google]reCaptcha[/google].

Hi Cups, thanks for your reply. I have used captcha service (not Google’s recaptcha) but that guy modified his bot to also include the captcha image that is being displayed on my site’s login form.

Do you think using Google’s recaptcha (instead of my own simple script) will work?

Should I use SSL to avoid him tracking sessions? What should I do? Please help.

[FONT=verdana]

But you should be using a different image every time. That’s the whole point.

The same thing applies: it’ll work if you use a different image each time.

But be aware that Recaptcha is very user-unfriendly. I know people who refuse to have anything to do with sites that use it (and I don’t blame them).

Mike
[/FONT]

Mike, thanks for your reply. I am using different images everytime. But that software copies captcha image from my site and displays it to users for them to login from the software bot instead of my website. I wonder if there is a way I can prevent or track logins that take place from a software bot instead of my website?

Sounds rough. This may help, depending on the bot and how this is being done. I usually have an additional input text field, asking something like gender, race, etc…, that I don’t actually collect, and place this in a div that is hidden, so it doesn’t actually appear on the form. Normal people, since it isn’t displayed to them, won’t fill this in, but a bot looking for fields to fill in will. Therefore, if the field does have a value in it, the script doesn’t actually submit the data.

If you do an advanced search on just this forum for “Turing”, you will turn up lots of discussions.

These are merely the last 3, but there are lots of methods.

http://www.sitepoint.com/forums/showthread.php?868439-Adding-anti-spam-to-contact-form&highlight=turing

[FONT=verdana]

Oh, I see. You mentioned that point earlier, but I didn’t pick it up. Sorry.

If that’s the case, then using Recaptcha rather than Captcha won’t make any difference. Nor will using other Turing-style tests, like a simple quiz or puzzle.

Using a hidden field might be a better idea. I’ve used it to stop spam in my contact form, but I’m not sure whether that would work in your particular scenario. If you’re interested, I wrote a blog post on the subject: A simple way of preventing contact form spam.

As you say, the real answer is to track logins that come specifically from a bot, but I can’t see any obvious way of doing that.

Mike
[/FONT]

What you’re describing as the ‘bot’ is:
User provides their login details to the bot.
Bot logs in using the VALID login credentials.
Bot rates every joke/whatever.

Every person using the ‘bot’ has their own IP, their own login. The bot isnt actually performing any ‘illegal’ action as far as your site is concerned, it’s just automatically performing the same action a user could do.

The captcha/turing test things can work - but put it on the RATING form, not the login form. The bot would have to query the user for every single rating they wanted to add, which would probably tick them off enough to not do it.

A lot of spam prevention is not “stop it from ever happening”, but is “make it so annoying for the spammer that they go find someone else to pick on”

What else could you do?

You could identify bot-added ratings based on time-between-ratings (say, less than 5 seconds = bot), but then the bot creator can add a 5 second delay.
You could identify bot-added ratings based on ratings-per-day, but that would potentially give false positives (real users who just leave a lot of ratings).
Limit the number of ratings a person can give in a day. Wont stop the bot, but it’ll slow it down at least. Course that also limits actual users.
Identify-and-kill. Again, identifying bot traffic would be difficult as long as the developer actively works against you, but it’d stem the flow.
Kill by IP. Blacklist IP addresses that use multiple accounts which fail the bot detector. It’ll at least shut down the bot user until he goes to any of the billion free wifi locations around the world today.

StarLion said most of what I was going to suggest. Here are a few more ideas though.

Require email-authentication of new user accounts so people can’t make multiple accounts to spam with. In conjunction with that, only allow each user to vote/rate each joke once.

If the bot has a specific user-agent you could filter on that, but user-agent strings are easy to spoof.

You could also only allow X many ratings in a certain timeframe … say 3 ratings every 15 minutes, then make them wait another 20 minutes before they can rate again if they break that threshold. That will slow down the bot enough that it wouldn’t be useful for them to use a bot.

You could also make some fake/hidden jokes/whatever (<a href=“fakejoke.php” style=“display: none”>blah</a>) and if someone visits that page, you know they’re a bot and you can block them automatically. If you take that route, I’d come up with a pseudo-random way to make the link so that it’s hard for the bot creator to filter out those specific links.

But as StarLion said, there is no iron-clad way to prevent it. All you can do is make it as inconvenient for them as possible so that it’s not worth the trouble. Using a combination of all the mentioned techniques is your best bet. Just make sure you balance security with user-friendly. You don’t want it to be THAT much of a pain for legitimate users to use the site.

Some other strategies that I’ve used in the past in game coding for high score/competition protection where bots have been used, and protection has been critical. These are somewhat complex and are only making things for a bot more difficult, as Starlion points out, not foolproof. First of all, I’d buy the bot off fiverr to investigate what the coders competence and strategy is.

Detection via user interaction history - store unique single use ids in the session/cookie per page, and combine with ajax calls from in page actions, analyse individually or a hash for duplication (if the bot operates by recording/playing back a legitimate user session) and time/route disparity (bots tend to have a linear and predictable progression rather than the fairly random progression of real activity).

Dynamically randomised html/url structure - if the bot is loading pages and looking for set elements in the html to ‘follow’ or navigate through, then you can randomise the structure of your html element ids on every page load to throw this off.

Socket communication - if you are happy to use flash within a page then you can perform certain elements of page to server communication over rtmp or even your own socket protocol, that can’t be easily mimicked or monitored by a bot. You can do this to a lesser extent using html5 but browser support is very limited.

Sandbox bad users - rather than show absolute results (e.g this joke has 472 thumbs up 132 votes down) show a percentage or star ratio (makes harder for hacker to quantify results). Discard bad votes from the overall score but upon submission show a bad user ‘success’ so as to not warn that their efforts have detected. Allow logged in bad users to view results containing their manipulations.

block .net bots like so.


<?php
if(isset($_SERVER['HTTP_USER_AGENT']) && strpos($_SERVER['HTTP_USER_AGENT'], '.NET'))
{
	header("location: bot_logger.php");
}
?>

Hi Guys,

Thank you all for your wonderful comments. I think I have got some good ideas from you to improve my application. kduv, can you please provide more details regarding pseudo-random way that you mentioned here.

I am assuming this bot is being run remotely? In any regard, here is a list of things you should be doing already.

  1. Using catchpas for all form based submissions.
  2. Using a reasonable timeout for each postable action (such as searching and rating of items). EX: 3 minutes.
  3. Checking HTTP_REFERER, either in htaccess or PHP code, to ensure all posts are from a local source.
  4. Validate input forms using a server generated anti-forgery token that is valid for that request only.
  5. Create a list of stop words / phrases that commonly show up in spam postings in order to block if present.

There are only so many things one can do to prevent spamming, especially if a bot is using a valid login.

Thanks for the suggestions. I think I need to implement point no. 3 and 4 in my website. Are there any other things that I can do to make this bot useless?

The problem you are facing is one of automation, rather than a hacking. This complicates things, as your script doesn’t know that the actions being performed are not desired. The only real way you have of stiffling them is to throttle them (point 2 above). If it takes the average ‘live’ user 60 seconds to review and rate an item, and another 30 seconds to navigate to another one, then simply setting a timeout of 90 seconds between rates, should be sufficient. At least that way, the bot is being throttled such that it spends the exact same amount of time doing things, as the user themselves would, and reduces your server load. If the bot user is a legitimate user, sinmply out to make thier browser expresience quicker, do you really want to tell them they can’t rate things? Probably not. However, at the same time, you don’t want them to abuse the system.

In regards to checking referer, sometimes a blank entry can be used, so you might want to block them as well. On the other hand, some legit browsers do not report it correctly either. It’s a judgement call.

1 Like