Spider Catching in ASP

As many of you may realize, the debate over serving Web spiders unique content (aka "cloaking") has been a hot topic at SitePoint.com , and the forums have been crammed with posts. Some people supporting the practice, others shun it, and many warn of the damage it can do to your reputation with the search engines (banning is a common recourse against sites that "cloak").

I’m going to show you some advanced algorithms I’ve used to cloak my own pages in ASP, which will hopefully be of interest to both people who don’t believe in cloaking, and those who do.

Traditional Methods

Traditional methods in cloaking involve detecting the USER_AGENT and then serving an alternative page to the user based on this criterion. You can obtain a list of common User Agents here. The page you served would of course be rich in META Tags and the words that you wish to optimize your site for. The code for achieving this is as follows. It should appear at the top of your page:

<%  
Dim spidercheck
Spidercheck = Request.ServerVariables("HTTP_USER_AGENT")
If  Spidercheck = "Googlebot" Then  
Server.Redirect("spiderrichpage.asp")
%>

Those of you who are familiar with ASP will be able to extend this code to redirect when it encounters any of the known spider names. You can find a detailed list of names here.

What do my Visitors See?

As all code is performed on the server side, your visitors won’t see anything. The spiders will go in one direction and your visitors will see the page that the code resides on. However, remember how we sent the spider to spiderrichpage.asp? This page contains meta-tags and content that relate to your site. This is the page that the spider will index and store in its database; therefore it will also be the page that appears in the search results.

As we don’t want your potential visitors to see an ugly META Tag page containing spider-centric content, we also have to put code on that page to send your visitors to the page you want them to see.

So, at the top of spiderrichpage.asp we insert:

<%  
Dim spidercheck
Spidercheck = Request.ServerVariables("HTTP_USER_AGENT")
If  Spidercheck <> "Googlebot" Then  
Server.Redirect("visitorpage.asp")
%>
Unconventional Algorithm

Ok, so we’ve served up a page for a particular spider name. But what if the search engines are fighting back? Suppose they try to fool our scripts by using spider names which are nearly identical to the user agents used to identify browsers. Anyone who maintains Web logs will recognize the following user agents:

Mozilla/5.0(compatible; MSIE 6.0; Windows 2000)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MSIECrawler)

Which Arachnid is Which?

Spot the spider? Yes that’s right! The one with MSIECrawler as its User Agent is the spider. Many spiders try to stay nice and close to the browser names so they don’t get spotted and redirected. How can you combat this? Well, ASP has a nice function already built into the language to help us out!

Instr stands for IN STRING, and with this we can detect if words such as CRAWLER or BOT or SPIDER appear within the User Agent. Another problem with my original code is that it only detects one spider at a time (unless you extend it to use IF THEN ELSE statements). I'm now going to attempt to kill two spiders with one can of fly spray! The dictionary object can be easily added to as more spider names appear in your site logs: 

<% 
 Sub AddViolation(objDict, strWord)
   'Adds a violation (a robot in this case)
   objDict.Add strWord, False
 End Sub
 Function CheckStringForViolations(strString, objDict)
   'Determines if the string strString has any violations
   Dim bolViolations
   bolViolations = False
   Dim strKey
   For Each strKey in objDict
     If InStr(1, strString, strKey, vbTextCompare) > 0  
       bolViolations = True
       objDict(strKey) = True
     End If
   Next
 CheckStringForViolations = bolViolations
 End Function
 Dim objDictViolations
 Set objDictViolations = Server.CreateObject("Scripting.Dictionary")
 AddViolation objDictViolations, "Googlebot"
 AddViolation objDictViolations, "Lycos"
 AddViolation objDictViolations, "Ultraseek"
 AddViolation objDictViolations, "Sidewinder"
 AddViolation objDictViolations, "InfoSeek"
 AddViolation objDictViolations, "Scooter"
 AddViolation objDictViolations, "WebCrawler"
 AddViolation objDictViolations, "UTV"
 Dim strCheck, strKey
 strCheck = Request.ServerVariables("HTTP_USER_AGENT")
 If Len(strCheck) > 0 then
   If CheckStringForViolations(strCheck, objDictViolations) then
 Response.Redirect("spiderrichpage.asp")
   Else
Response.Redirect("userpage.asp")
   End If
 End If
%>

Now we have a way to add spiders as they come along, and to serve them up a special page. You advanced coders out there will be able to send different pages for different spiders -- I'll leave that as a learning exercise.

Innovative Algorithm

This next algorithm is quite interesting. As all Web designers and developers know, META Tags still play a small roll in some search engine placement systems. Well, what if we could change our meta-tags to match whatever the spiders are searching for? Yes, this is possible. We can use the HTTP_REFERRER to determine what the search spiders are looking for, and then write these into both our meta-tags and, if we wanted to, our main content.

<%  
 
If InStr( Request.ServerVariables("HTTP_REFERER"), "google") > 0 Then  
KeyURL = Request.ServerVariables("HTTP_REFERER")  
'KeyURL = "http://www.google.com/search?hl=en&ie=  
ISO-8859-1&q= spider+food+for+fun&btnG=Google%20Search"  
 
' Remove all up to q=  
KeyLen = Len(KeyURL)  
kStart = InStr( KeyURL, "q=" )  
kStart = kStart  + 1  
KeyRight = KeyLen - kStart  
Keyword = Right( keyURL, KeyRight )  
 
' Check for trailing query string and remove text  
If Instr(Keyword, "&") > 0 Then  
kEnd = InStr(Keyword, "&")  
kEnd = kEnd - 1  
Keyword = Left( Keyword, kEnd )  
End If  
 
' Turn encoding into text phrase  
Keyword = Replace(Keyword, "+"," ")  
Keyword = Replace(Keyword, ",",", ")  
Keyword = "," & Keyword  
End If  
%>

How do I use this?

You can now write this keyword into your content using Response.Write(keyword). This could be in the space for your meta-tags, or it could be in an html layer hidden off the page.

Tips and Tricks

The code provided in this article can be easily modified and with a bit of creativity you could design some interesting algorithms. You could modify it to serve different pages for each spider, and maybe keep track of them in a database or design a statistics system for the Web community to display the frequency of the spider visits to your site.

There are many great promotional tools for optimizing the pages you plan on serving up to the spiders. The links used in this article are listed below, along with a few extras that may be of interest to you in optimizing pages.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

No Reader comments

Comments on this post are closed.