Case Insensitive Search Engine Bot Match--Regular Expressions

I did a search and did not find anything. But the Sitepoint forum is not working too well at the moment.

When I put advertisements on my site, Google drops my traffic. So I want to test the user agent to see if it is any of the search engine bots and if so, hide the advertisements from them. Since I am not hiding any real content, I don’t think this should be an issue of “cloaking” or anything.

Good documentation for regular expressions is difficult to find. It is pretty much a guessing game. I don’t know how to use regular expressions and just need a quick answer.

Will this work for a case insensitive match to detect a search engine bot?

if (preg_match(“googlebot/i|slurp/i|msnbot/i”), $_SERVER[‘HTTP_USER_AGENT’]))

Can anyone help? Thanks. :slight_smile:

Thanks, Salathe, it works great. :slight_smile:

// If anything other than Google, Yahoo, or MSNbot, do something.
if (!preg_match(“/google|slurp|msnbot/i”, $_SERVER[‘HTTP_USER_AGENT’]))
{
// Show my ad.

}

Yes, if by “work” you mean successfully match (case-insensitively) those words within the value of $_SERVER[‘HTTP_USER_AGENT’].

The s is not necessary. This affects the dot special character (.); since your regex does not contain that, the s is not doing anything.

Scanning the PHP docs should have given you understanding enough to get an answer in maybe 10 minutes of scan-reading. Anyway, keep those documentation sources in mind as they’ll come in useful when you develop a little tiny bit more patience.

Finally, building on what you’ve got so far and removing unnecessary bits, a pattern to match those words case-insensitively would look like:

/google|slurp|msnbot/i

OK, I found this example. Will this work:

if (preg_match(“#(google|slurp|msnbot)#si”, $_SERVER[‘HTTP_USER_AGENT’]))

My understanding is:

- is the delimiter to indicate the start and end of the regex pattern. This is used instead of the typical / (forward slash).

si - indicates a case insensitive search. Does this work? I read that you have to use an “/i” to indicate a case insensitive search with preg_match. I’m not sure what the deal is and how much typical regex stuff works with preg_match.

I have no clue. The documentation you provided isn’t going to give me any quick answer without hours of reading and trying to figure out stuff.

Otherwise, I’ll have to do three preg_matches like this:

if (preg_match(‘/googlebot/i’) || (preg_match(‘/slurp/i’) || (preg_match(‘/msnbot/i’))

Three preg_match statements is going to be inefficient. I’d like to do it in one.

No, it won’t work at all (your regular expression syntax is not valid).

Since you’re having trouble finding documentation, here are three must-read links: