Spider Catching in ASP
As many of you may realize, the debate over serving Web spiders unique content (aka "cloaking") has been a hot topic at SitePoint.com , and the forums have been crammed with posts. Some people supporting the practice, others shun it, and many warn of the damage it can do to your reputation with the search engines (banning is a common recourse against sites that "cloak").
I’m going to show you some advanced algorithms I’ve used to cloak my own pages in ASP, which will hopefully be of interest to both people who don’t believe in cloaking, and those who do.
Traditional methods in cloaking involve detecting the
USER_AGENT and then serving an alternative page to the user based on this criterion. You can obtain a list of common User Agents here. The page you served would of course be rich in META Tags and the words that you wish to optimize your site for. The code for achieving this is as follows. It should appear at the top of your page:
Spidercheck = Request.ServerVariables("HTTP_USER_AGENT")
If Spidercheck = "Googlebot" Then
Those of you who are familiar with ASP will be able to extend this code to redirect when it encounters any of the known spider names. You can find a detailed list of names here.
What do my Visitors See?
As all code is performed on the server side, your visitors won’t see anything. The spiders will go in one direction and your visitors will see the page that the code resides on. However, remember how we sent the spider to spiderrichpage.asp? This page contains meta-tags and content that relate to your site. This is the page that the spider will index and store in its database; therefore it will also be the page that appears in the search results.
As we don’t want your potential visitors to see an ugly META Tag page containing spider-centric content, we also have to put code on that page to send your visitors to the page you want them to see.
So, at the top of spiderrichpage.asp we insert:
Spidercheck = Request.ServerVariables("HTTP_USER_AGENT")
If Spidercheck <> "Googlebot" Then
Ok, so we’ve served up a page for a particular spider name. But what if the search engines are fighting back? Suppose they try to fool our scripts by using spider names which are nearly identical to the user agents used to identify browsers. Anyone who maintains Web logs will recognize the following user agents:
Mozilla/5.0(compatible; MSIE 6.0; Windows 2000)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MSIECrawler)
Which Arachnid is Which?
Spot the spider? Yes that’s right! The one with MSIECrawler as its User Agent is the spider. Many spiders try to stay nice and close to the browser names so they don’t get spotted and redirected. How can you combat this? Well, ASP has a nice function already built into the language to help us out!
IN STRING, and with this we can detect if words such as CRAWLER or BOT or SPIDER appear within the User Agent. Another problem with my original code is that it only detects one spider at a time (unless you extend it to use
IF THEN ELSEstatements). I'm now going to attempt to kill two spiders with one can of fly spray! The dictionary object can be easily added to as more spider names appear in your site logs:
Sub AddViolation(objDict, strWord)
'Adds a violation (a robot in this case)
objDict.Add strWord, False
Function CheckStringForViolations(strString, objDict)
'Determines if the string strString has any violations
bolViolations = False
For Each strKey in objDict
If InStr(1, strString, strKey, vbTextCompare) > 0
bolViolations = True
objDict(strKey) = True
CheckStringForViolations = bolViolations
Set objDictViolations = Server.CreateObject("Scripting.Dictionary")
AddViolation objDictViolations, "Googlebot"
AddViolation objDictViolations, "Lycos"
AddViolation objDictViolations, "Ultraseek"
AddViolation objDictViolations, "Sidewinder"
AddViolation objDictViolations, "InfoSeek"
AddViolation objDictViolations, "Scooter"
AddViolation objDictViolations, "WebCrawler"
AddViolation objDictViolations, "UTV"
Dim strCheck, strKey
strCheck = Request.ServerVariables("HTTP_USER_AGENT")
If Len(strCheck) > 0 then
If CheckStringForViolations(strCheck, objDictViolations) then
Now we have a way to add spiders as they come along, and to serve them up a special page. You advanced coders out there will be able to send different pages for different spiders -- I'll leave that as a learning exercise.
This next algorithm is quite interesting. As all Web designers and developers know, META Tags still play a small roll in some search engine placement systems. Well, what if we could change our meta-tags to match whatever the spiders are searching for? Yes, this is possible. We can use the
HTTP_REFERRERto determine what the search spiders are looking for, and then write these into both our meta-tags and, if we wanted to, our main content.
If InStr( Request.ServerVariables("HTTP_REFERER"), "google") > 0 Then
KeyURL = Request.ServerVariables("HTTP_REFERER")
'KeyURL = "http://www.google.com/search?hl=en&ie=
' Remove all up to q=
KeyLen = Len(KeyURL)
kStart = InStr( KeyURL, "q=" )
kStart = kStart + 1
KeyRight = KeyLen - kStart
Keyword = Right( keyURL, KeyRight )
' Check for trailing query string and remove text
If Instr(Keyword, "&") > 0 Then
kEnd = InStr(Keyword, "&")
kEnd = kEnd - 1
Keyword = Left( Keyword, kEnd )
' Turn encoding into text phrase
Keyword = Replace(Keyword, "+"," ")
Keyword = Replace(Keyword, ",",", ")
Keyword = "," & Keyword
How do I use this?
You can now write this keyword into your content using Response.Write(keyword). This could be in the space for your meta-tags, or it could be in an html layer hidden off the page.
Tips and Tricks
The code provided in this article can be easily modified and with a bit of creativity you could design some interesting algorithms. You could modify it to serve different pages for each spider, and maybe keep track of them in a database or design a statistics system for the Web community to display the frequency of the spider visits to your site.
There are many great promotional tools for optimizing the pages you plan on serving up to the spiders. The links used in this article are listed below, along with a few extras that may be of interest to you in optimizing pages.