Latest Search Engine Spam Techniques

Search engine rankings are extremely competitive and Website owners are under pressure to do all they can to gain visibility in search results.

Those pressures come from many quarters: there are branding restrictions, style guidelines, legal issues, navigation needs, sales conversion demands, site interaction demands and more.

The fact remains, though, that search engines were designed for information purposes. This presents hurdles to businesses that try to exploit the search engines in order to attract users who seek in information, then try to sell them something. To overcome these hurdles, many businesses use increasingly ruthless tactics — tactics that lead them into dishonest territory — to gain those higher search rankings.

Exploiting the Engine

The exploitation of search engines today is a serious issue, but, like it or not, most businesses see it as something that must be done — an online business imperative. To exploit a search engine, however, most organizations must exploit a search engine optimization company. In these arrangements, exploitation, or the gaining of something for nothing, becomes the central theme for interaction between client and SEO provider.

Thousands of SEO providers are now in business, and each ranking promiser is more famous than the next. For numerous of these service providers, quality is not an issue. What matters is making promises that beat the competition and win them the client. Faced with these enormous and often unreasonable pressures, ethical SEOs will withdraw from an optimization project. Unethical SEOs, however, will take on the project, saying, "No problem. I’ll take care of it."

"Taking care of" an impossible situation means spamming. The client’s demand for the impossible and expectation of something for nothing pushes the SEO or Webmaster to that sorry path of search engine spamming. This approach involves the study and nurturing of a growing list of tricky spam techniques.

The best way to diffuse the issue is to bring these methods to light. If everyone knows about a spamming technique, it will cease to work. This is the way to defeat search engine spam, and is the purpose of this article.

Who’s Responsible?

Search engines value popular, content-rich sites; however, many Website owners either can’t or don’t want to spend the money required to create that type of content and popularity. The needed resources, such as researchers, Web content developers, copywriters and skilled SEOs, aren’t available, or are beyond the financial resources of the company.

This is the something for nothing scenario that launches all spam projects.

TrafficPower is a SEO provider that was made infamous by Google’s taking action to ban the company and its clients from their index. A Google rep was quoted as saying "I believe that one SEO had convinced clients either to put spammy JavaScript mouseover redirects, doorway pages that link to other sites, or both on their clients’ sites. That can lead to clients’ sites being flagged as spam in addition to the doorway domains that the SEO set up."

Now, it seems Traffic Power’s clients are suing the company, but the damage is done. We still have to wonder who the guilty parties are.

In reality, when a site utilizes spam tactics, it is the client who’s ultimately responsible, not the SEO provider. The client has control over a Website and its deployment. When spamming occurs, the Website owner is solely responsible.

The Lure of Top Listings

It is well publicized by some shady operators that rankings are cheap and easy to get. That lie — and the expectation it generates — forces some SEOs to offer a guarantee of top 5 rankings. This, in turn, puts pressure on all SEO providers to provide similar guarantees.

Besides angering search engine companies, such guarantees are misleading. Top rankings can’t be put on a schedule like an advertising buy. Search engine-organic results are not for sale, and it is this element of honesty that ensures their continued popularity: that which cannot be bought is trustworthy.

When SEOs can’t achieve rankings on schedule, they are forced to refund perhaps thousands of dollars. Since many are barely able to pay their bills, they can’t afford to return that money. This sets the stage for SEO spamming.

There are spammers who don’t care one way or another — they don’t mind cheating as they have no sense of ethics. There are also large SEO companies that are tasked to create rankings for clients that just shouldn’t be attempted. They want to automate the SEO process in order to increase revenues. Search engines, in contrast, want to rid their indexes of automated material of any kind.

The Website owner’s greed, combined with search engine spammer’s opportunism, sets the stage for an unholy union. Here’s just one example of a spamming site I’ve seen.

Spammingsite1.com used several types of spam to achieve strong results:

  • mouse-activated redirects
  • hidden table cells stuffed with keywords within <h1> tags
  • links from contrived Websites

The end users saw a different page than the search engine indexed. The search engine was tricked by these tactics, and, as is the case with all instances of spamming, lost control of the product it served to search users.

Spammingsite1 was a leader in the search results — but only because of spam. A check of the sites that linked to Spammingsite1 revealed a list of dubious quality sites with which no legitimate site owner would have wanted to be associated. One of the sites was a growing list of open directory copies — sites that draw all their content from the open directory project. Copies of Open Directory listings represent a huge problem for Google.

The Perils of New Content Types

As Google and Yahoo! venture into spidering new types of Web content, they run the risk of being tricked by the complexity of the code itself. Spammers succeed by staying ahead of the technical filtering capabilities of search engines.

Search engines apply content filters as they spider sites, and afterward, in what’s called post-processing. This sophisticated filtering is wonderful, however it’s also limited by the imagination, foresight and programming of the engineers. Spammers can trick the system by exploiting cracks in the filters.

Sometimes innocent sites are penalized because they appear to have some characteristics of spamming. Is your site one of them? Why might a legitimate link to your site not be recognized? It probably looks like a paid link to the search engine. This is another huge problem for search engines: their filters are so complex that they become almost uncontrollable, and innocent sites are incorrectly penalized.

Search engines can only see and know so much about any given Website and its owners. One SEOs content and links are another’s spam, so it’s difficult to make statements about who the spammers are. The problem is further complicated by the fact that search engines have different listing and content assessment guidelines.

There are, of course, numerous tactics that are considered spam. Below are some of the most common spamming techniques being used right now — tactics that should be avoided.

  • Publishing empires
  • Wikis
  • Networked blogs
  • Forums
  • Domain spam
  • Duplicate domains
  • Links inside No Script tags
  • Javascript redirects
  • Dynamic real-time page generation
  • HTML invisible table cells
  • DHTML laying and hidden text under layers
  • Humungous machine-generated web sites
  • Link stuffing
  • Invisible text
  • Link farms

Let’s discuss each of these in more detail.

Publishing Empires

When a publisher builds a vast array of interlinked Websites, it can generate high PageRank and subsequent rankings. This form of spam is difficult for a search engine to penalize, since the links are legitimate. Any single business entity has the right to interlink its own Websites. The company can create further overlap between the sites’ content themes so that the links are truly valued by search engines.

This kind of activity is exemplified by one of the Internet’s largest publishers. The business has 120+ Web properties, all of which are carefully linked to the others. Perform a search on one of these sites, and you’re virtually guaranteed to see one of the company’s other Web properties in the search results.

Many among the most successfully ranked sites use this system — this form of spamming is extremely widespread. Perpetrators basically collect PageRank and link reputation within their network, then use it creatively to dominate the best keyword phrases. Search engines haven’t found a way to stop this technique, but they’ll have to. This form of spamming is a major threat to the quality of search results.

Wikis

Wikis are Web repositories to which anyone can post content. They can be a great way to present and edit ideas without close censorship, and have proven extremely successful for the creation, management, and maintenance of projects that require input from users around the globe.

However, despite their considerable advantages, the often un-scrutinized nature of wikis makes them ripe for abuse. Like a link farm, a wiki’s links are free for all. Ironically, the value of wikis is consistent with popularity-based search engines. Some of these wikis boast a very high pagerank, which can make the wiki an attractive place from which to gain a link to your site. But without close human control, users may simply add their links as a means to take advantage of the wiki’s PR. Until another user of the wiki removes the link, the linked site enjoys the benefits of this unscrupulous activity. The search engine spammers have control.

Networked Blogs

Blogs can be a source of precise, up-to-date and technically detailed information, presented by specialists and experts. Blogs are thus very valuable to info-hungry searchers, and are extremely popular.

However, some spammers start a blog, plug it full of garbage content such as comments on what they thought at 5:15, along with a link or two and some keyword rich text. Keyword rich musings don’t present real value to deceived searchers. Worse still, blogs often operate in a free-for-all link structure that further validates the linked sites in search engine indexes.

Forums

Like blogs, forums can be a rich source of relevant information.

Unfortunately, some forum participants make comments in forums only in an effort to publish links back to their own sites. This may be acceptable if the user provides help or assistance to another forum member. Indeed, they should gain credit for that information, which they may have worked hard to discover.

However, when the posts become excessive and are comprised solely of glib or irrelevant comments, then value of the link, or indeed, the whole forum, can be put into question. Some forum owners only start forums in the hope that they will raise search engine rankings.

Domain Spam

Probably the most popular spam technique today involves creating and hosting a number of Websites. These sites rarely have any intrinsic value other than providing ranking support for the owner’s main Website.

I’ve had several former clients who had used this technique — and had been penalized for it. After I got them to get rid of the duplicates completely, their rankings were repaired.

Duplicate Domains

Why can’t Google detect two exact duplicate Websites that only differ on domain names? Why would Google give these same sites first and second rank for the very same phrase? This happens all too frequently and is due to Google’s preoccupation with linking between topically related sites.

Domain spam is usually the result of a corporation’s attempt to have Web sites for each of its company departments or subsidiaries. Those with many subsidiaries get a big boost from these domains. Realizing this, spammers are increasingly encouraging clients to have sites hosted on different IP addresses and even in different geographical locations.

The link pattern detection used by Google has difficulty dealing with this practice, and is currently failing to cope with it. Google’s new emphasis on authority sites actually makes this matter worse, as the authority can gain trust it really doesn’t deserve.

Links Inside No Script Tags

One top publishing site I recently discovered secretly interlinked its sites using the no script tag. Although I can’t name the site, I can show you how the technique worked.

Used legitimately, the no script tag provides spiderable links when a user’s browser (or a search engine robot), has its JavaScript turned off. Anything that appears inside the no script tags is not visible on the Web page itself.

To be used authentically, the no script tag must contain links that replicate those used within JavaScript code in the actual page.

But in this case, the links went to sites that strategically collected PageRank. They were basically hidden, acting as underground network of links to support the publisher’s rankings. This code appeared in almost all of the site’s many domains — and perhaps exists in the Websites of others, who may not even know it’s there! Some of the pages only used a closing </NO SCRIPT) tag, which could also confuse search engines.

<SCRIPT LANGUAGE="javascript" SRC="http://www.spammersite1.com/counter.asp?ID=2667&NoLink=1" TYPE="text/javascript"></SCRIPT>  
         <NOSCRIPT><a href="http://www.spammersite3.com">new homes</a> <a href="http://www.spammersite3.com/popularkeywords.asp?  
Keyword=concrete+design">concrete  
         design</a> <a href="http://www. spammersite3.com/popularkeywords.asp?  
Keyword=precast">precast</a>  
         <a href="http://www. spammersite3.com/popularkeywords.asp?  
Keyword=mantel">mantel</a>  
<a href="http://www.spammersite4.net/">home decorating</a>  
<a href="http://www.spammersite5.biz/">home  
         improvement world</a> <a href="http://www.spammersite6.com">luxury homes</a>  
         </NOSCRIPT>

The complex code above was even loaded with keywords (using asp code). These keywords signaled to the Web server at the spam target site the type of dynamically generated page that should be served in response to the query. These tactics are not approved of if they're done deliberately to manipulate search rankings. Visitors to this site were totally oblivious to the devious intent of the site owner, and search engines were fooled as well.

Non-Robot JavaScript Detectable Redirects

The use of mouseover code like that shown below is quietly spreading across the Web:

<body onMouseOver="eval(unescape('%6C%6F%63%61%74%69%6F%  
6E%2E%686F%70%69%63%62%61%74%6F%6E%73%2E%6E%65%74%2F%27%3B'));"

There have been rumors that Google is taking action against this tactic. In the cases I discovered, the JavaScript code automatically redirected the visitor to another page, but only upon the cursor being moved over the page itself. It was almost impossible for the user to avoid setting this code off.

I found the code on a site ranked number one on Google for its primary keyword phrase. As search engine robots don't use a mouse, they're blind to the spamming activity. In this case, the tactic was combined with a server side redirect to another page, which was relevant only in some cases. The purpose of the redirect may have been part of bigger ploy to support another ranking strategy.

Dynamic Real Time Page Generation

It is possible for a Web server to produce and serve different, optimized pages according to the referrer of any page request.

In theory, there is nothing wrong with serving a page that's customized to the circumstance in which it was requested -- indeed, many ad campaigns serve up different ads based on the type of banner that was clicked. Customized ads are seen as being far more effective and useful for users.

With dynamic page spam, however, the site is loaded with hundreds of these phantom pages (dynamic urls) that act as affiliate links to some other site. Search engines don't want affiliate links. In the case I found, all the links were credited to the site's backlink count.

I don't think this is what search engines had in mind when they began to spider dynamic urls -- they certainly don't want to allow affiliate link spam.

Here's what the links typically look like:

www.spammersite7.com/perl/click.pl?id=2068&a=i

When the robot follows the links, it receives a meta refesh that links to an error page called redirect.cfm. This page has links back to the home page, which are credited to the site's backlink count.

<meta http-equiv="REFRESH" content="2; URL=http:// www.spammersite7.com/redirect.cfm?url=spammersite7.com">  
 
</head>  
<body onLoad="document.form1.submit();" >  
Please Wait...  
<form name="form1" method="post" action="redirect.cfm">  
<input type="hidden" name="url" value=" spammersite7.com ">  
</form>

DHTML Layering and Hidden Text

Using DHTML layering, spammers can hide layers of keywords beneath graphics. One layer covers the other visually, yet the text hidden on the lowers layer is readable by the search engine robot -- another highly illegal technique.

HTML Hidden Table Cells

The combined powers of CSS and html and the loose dtd <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> allow the unscrupulous site owner to hide the content of table cells loaded with keywords and heading tags.

CSS permits the flexible positioning of Web page elements; it's a flexible coding language that search engines do not fully understand at present. In short, the search engine doesn't really know what's being displayed. This trickery can be specified in a separate CSS sheet (.css file), which a search engine may or may not index. This CSS style sheet file, however, does affect the display of content on the page.

In this example, the CSS affects the display of the body of the Web page, which is set to 97%:

{font-family: Arial, Helvetica, sans-serif; width:97%; font-size: 10pt; overflow: hidden; color: #000000; margin: 0px;}.

Within the regular code, .gif files can be placed in the page at a width of 150%, ensuring that part of the page is not seen. That extra 50% provides plenty of room for keywords stuffed into <h1> tags.

Enormous Machine-Generated Websites

Those Webmasters who are not adept to html, dhtml or css tricks may try something simpler. When there's not enough content to go around, they often try to stretch a minimal amount of content across thousands of pages. The pages are built with templates and the sentences within them are basically shuffled from one page to the next. Unique title tags are plugged into each page that's generated.

This technique basically sees the same page repeated hundreds to thousands of times. It can even be done using a computer program that systematically stuffs the text sentences, paragraphs and headings, including keywords, into pages.

This technique is most often used with ecommerce sites that have a limited range of products for sale. Often, the products are simply re-organized, or shuffled to create another page that appears to be unique. It's actually the same selection of products presented countless different ways.

Link Spam

To maximize Pagerank distribution throughout a Website, some spammers will fill a page with links to the point where it is just a links page, and every page links to every other page.

Why do this? Well, by maximizing the number of links, the spammer more equally spreads PageRank throughout his or her site. When links from all those pages point to a single page on a keyword topic, the site can gain higher rankings for that phrase.

Link exchanges are also considered link spam. The links are fabricated -- not a real reflection of personal choice. Most link exchanges are now being filtered out of search results; however, some links in link exchanges are still being recognized.

This system allows the server to give the robot different content than that which is delivered to human visitors. And that means the search engine could be deceived.

Invisible Text

Invisible text is invisible because the font color is the same as the color of the background or background image.

In one example I saw, a site used the font color "snow" to make the text white on a white background. The author also used this font tag in a way that caused it to overlap another tag, thereby confusing the search engine robot further.

The example below uses a black color .gif as the background to hide black text. It also has a dhtml layer directly above it, to further hide the text.

<body bgcolor="#000000">  
<table width="14%" border="0" cellpadding="6" cellspacing="0" bgcolor="#FFFFFF">  
 <tr>  
   <td background="black.gif"><font color="#000000">invisible text</font></td>  
 </tr>  
</table>  
<div id="Layer1" style="position:absolute; width:200px; height:115px; z-index:1; left: 5px; top: 8px; background-image: url(black.gif); layer-background-image: url(black.gif); border: 1px none #000000;"></div>  
</body>

A robot can't detect whether text in a dhtml layer is the same as the background used in a layer below it. The layer can even be set off-screen so it is never visible to a person.

Link Farms

Link farms are still prevalent on the Web, even though search engines can detect their presence through link pattern recognition. Since link spamming is being done at a macro level, so search engines must be able to view a large sophisticated network of links and delete those that are machine-generated and not true, human-chosen links.

The hilltop algorithm is one filter that minimizes the advantage gained by hundreds of useless links.

Spamming Penalties

Each search engine has its own distinct prohibitions and related penalties. Each penalty is a response to the degree of threat the search engine that a given spamming technique represents.

Spammers may receive demerits, through which the ranking of their sites on a particular phrase might drop significantly. Alternatively, a zero PageRank penalty may be awarded to a particular page, or whole sites may be banned if the search engine so chooses.

Now that these techniques are widely known, I strongly advise you not to try them. The search engine engineers may be embarrassed that these tricks really do work, and will move swiftly to take action against spammers.

Oh What a Wicked Web We Weave

What's the final word on search engine spam? Well, that's between you and the search engines. Now that you know some of the popular spamming techniques in use, you'll at least know how to avoid using them. Once word gets out, the search engines will ban their usage.

To avoid the problems created by spamming, choose an SEO that can achieve legitimate results. Don't ask for top ten guarantees when guarantees are considered wrong by the search engines. Hire an SEO that offers the full package of content creation and development. You'll get your money's worth, the search engines will get rich, useful content, and your site will attract targeted, qualified users.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • meenakshi venkatraman

    I enjoyed the blog very much.Black hat SEO firms use keyword stuffing in coding also.Can the search engines detect that code in PHP (CMS).