How Google Really Wants You to Optimize Your Site

Goofy is SEO friendly.Does Google care for SEO? Yes, it does: from Google’s SEO Starter Guide (pdf) to help provided in the Google Webmaster Help Forum, the search engine is pretty transparent when it comes to how it prefers you to optimize your site for inclusion. We’ll be discussing URL structure, TrustRank and duplicate content issues.

To start with a conclusion: if you do what they like, chances are your site will not only included, but also ranked better. And now let’s go “in depth” and see, once and for all, how Google prefers you to optimize your site for its search engine.

All questions in this article were asked by users on Google Moderator Beta, Ask a Google Engineer.

Does Google Really Care About SEO?

The answer is yes, and it comes from Google’s search evangelist Adam Lasnik:

Just like in any industry, there are outstanding SEOs and bad apples. We Googlers are delighted when people make their sites more accessible to users and Googlebot, whether they do the work themselves or hire ethical and effective professionals to help them out.

Please note that the “Googlers are delighted” when sites are optimized for search. The moral: know your SEO!

What Is the URL structure preferred by Google?

Google’s Matt Cuts replied:

I would recommend
long-haired-dogs.html
long_haired_dogs.html
longhaireddogs.html
in that order. If your site is already live on the web, it’s probably not worth going back to change from one method to another, but if you’re just starting a new site, I’d probably choose the URLs in that order of preference. I can only speak for Google; you’ll need to run your own tests to see what works best with Microsoft, Yahoo, and Ask.

Is Google Using a “TrustRank” algorithm?

Many SEOs suggest that if a bad neighbor site links to yours, your site’s “trustrank” with Google will be lowered.

False. If this were true, any competitor could harm a site, just for spite. Google doesn’t use “TrustRank” to refer to anything, although the company did have an attempt to trademark “TrustRank” as a term for an anti-phishing filter. Google abandoned this trademark in 2008. The only company that might use “TrustRank” is Yahoo!

Matt Cutts offers an explanation about how a competitor can hurt another competitor:

We try very hard to make it hard for one competitor to hurt another competitor. (We don’t claim that it’s impossible, because for example someone could steal your domain, either by identity theft or by hacking into a domain, and then do bad things on the domain.) But we try hard to keep one competitor from hurting another competitor in our ranking.

Will Google Learn How to Identify Paid Links without Making Webmasters use “nofollow”?

The answer is yes. Of course, we know that one of the techniques is to ask users to “report paid links” but Google also depends on its algorithms. According to Matt Cutts:

We definitely have worked to improve our paid-link and junk link detection algorithms. In our most recent PageRank update (9/27/2008) for example, there are some differences in PageRank because we’ve improved how we treat links, for example.

The “nofollow” attribute on links is a granular way that site owners can provide more information to search engines about their links, but search engines absolutely continue to innovate on how we weigh links as well.

My Site’s Been Penalized Because Of Duplicate Content!

Right, this is a tough one, and it is not answered by any Google engineer at Google Moderator beta, but, Adam Lasnik did provide a link to an answer by Google’s Susan Moskwa, webmaster trends analyst: Demystifying the “duplicate content penalty”

The answer to “does Google penalize duplicate content” is a “yes and no.” To better understand how this works, we need first to “define” duplicate content in Google terms.

Usually, duplicate content refers to blocks of content that either completely match other content or are appreciably similar. There are two types of such occurrences: within-your-domain-duplicate-content and cross-domain-duplicate-content.

Within-your-domain-duplicate-content

Google filters duplicate content in different ways. If your site has a regular and a print version of each page, and none is blocked in robots.txt or with a “noindex” meta tag, Google will simply choose one to list and eliminate the other.

Ecommerce sites, which are usually managed with CMS, may sometimes store items shown (and — worse yet — linked) via multiple distinct URLs. Google will group the duplicate URLs in clusters and display the one it considers best in the search results.

click to enlarge

click to enlarge

The real negative effects of having duplicate content on multiple URLs are:

  • Diluting link popularity (instead of getting links to the intended display URL, the links may be divided among distinct URLs)
  • Search results may display “user-unfriendly URLs” (long URLs with tracking IDs, etc), which is bad for “site branding”

The “duplicate content penalty” applies when content is deliberated duplicated to manipulate SERPs or win more traffic – in such cases Google will make adjustments in the indexing and ranking of the sites involved.

Cross-domain-duplicate-content

This is the nightmare of all serious Web publishers: content scraping. Someone could be scraping content to use on “made for AdSense” sites for example. Or some web proxies could index part of the site they access through the proxy. Google says this is not a reason to worry, although many publishers experienced “penalties” when the scraper ranked higher than the original.

Google states that they are able to determine the original. Apparently if the scraper ranks higher the reasons might lay in the way the “victim” site is “prepared” for Google. Here are the possible “remedies” suggested by Sven Naumann, from Google’s search quality team:

  • Check if your content is still accessible to our crawlers. You might unintentionally have blocked access to parts of your content in your robots.txt file.
  • You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.
  • Check if your site is in line with our webmaster guidelines.

The most interesting statement comes from the same Google search expert. How would you “translate” the following:

To conclude, I’d like to point out that in the majority of cases, having duplicate content does not have negative effects on your site’s presence in the Google index. It simply gets filtered out.

Let me offer you my version: if your site gets scraped by a site with higher popularity, your original content will simply get filtered out. If this is not “duplicate content penalty”, then what should we call it? Perhaps eradication? (!)

Luckily Google does offer a solution: if your content is scraped and the scraper ranks higher, you can file a DMCA request to claim ownership of the content. Good luck with that!

Last but not least: when you syndicate content on other sites, if you want the original article to rank you have to make sure that these sites include a link back to it. We’ve seen cases when, despite the link, the original was given less weight and… uhm… eradicated?

For affiliate sites that need to display duplicate or similar content with the „mother“ site, the solution proposed by Google is also simple: copying content for a site will cause Google to „omit“ your site from the results, but if you add extra value to your pages (more content) you have good chances of ranking well in the SERPs. So how much is more? One paragraph, two? Go figure!

My second conclusion: what can you take from this article? Maybe even Google doesn’t know what it wants – the algorithms are in total control – and it doesn’t want humans to know what it’s doing. (!) Seriously, Google likes optimized sites, and your best bet is to never duplicate content. Period.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.thalesdotcom.com ThalesMM

    I can tell that sanes a few doubts that I had about the way google deals with websites, I’m now using the permalink on wordpress to my post name, no file extension after the post name. I wonder if that makes any difference..

  • Jay

    Please learn to spell ‘scraper’.

  • http://www.e-brighthorizons.com Saboma

    I was just reading about this stuffola mentioned in PDF e-book you wrote a while back earlier today. Moreover, I’m looking forward to reading the updated version when it is completed.

    Thanks for sharing, Girly!

  • Deborah

    The scrapers are my biggest bane. I’ve gone to the extent of reporting the really bad ones for repeated offenses to Google for ‘site behaving badly’ ;-)

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @ThalesMM – yes, it certainly does.

    @Jay – thank you for pointing that out. This proves that you at least read the article in your hunt for spelling errors. ;) I actually do know how to spell it, it’s just that sometimes I confuse terms. What you spotted was not a spelling error, but a confusion of terms – scrapper vs. scraper. In my defense, I am not a native English speaker. Oh, I know, it’s no excuse – after all it is the only error I made, unlike native English speakers who never make a mistake!

    @Saboma – we are lucky to have so many useful plugins to deal with duplicate content issues in WP. As for URLs – this was logical. The same rule goes for file name optimization. :)

    @Deborah – that’s the price to pay for having a very popular blog! Sad, but true. SitePoint content is getting scraped too!

  • spheroid

    URL structure: I’m developing using CodeIgniter and wonder how google suggests URLS should be formatted such as:

    mysite.dev/forum/post/what-do-i-look-for-in-a-bank-20

    Should the urls contain a .htm? Is there a proper order to where it sees it’s a forum post vs. blog, etc?

  • http://www.arwebdesign.net samanime

    I’d say as far as the extension goes, if I had to guess I’d say Google doesn’t worry about the extension at all. I could be wrong, but it’d be kind of dumb to let it play a role.

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @spheroid you don’t need the htm

    @samanime: the answer that they do worry about the extensions comes directly from Matt Cutts, who is a Google engineer. The way the worry: “they PREFER” the URL structure
    “long-haired-dog.html” above all other types.

  • http://www.sitepoint.com/ mmj

    This is a good blog post. One thing though, when you say

    if your content is scraped and the scraper ranks higher, you can file a DMCA request to claim ownership of the content. Good luck with that!

    you seem to imply that you may not get any luck doing this. Actually, this is one of the things you can be most certain about. If you make a DMCA request to Google, they will almost certainly remove the allegedly offending material in response (if they challenged it, they could be held liable for the copyright offender’s actions). You don’t have to prove it in a court or anything, just fill out a DMCA request properly and mail or fax it. As far as getting the content back onto Google, the burden of proof is kind of put on the alleged copyright offender instead.

    It was a law passed to make it easier for music and movie companies to get offending material removed without needing to prove anything, and at the same time offer protection from prosecution to providers like Google. I’m not saying it’s fair. For example, anybody could falsely accuse YOU and your stuff would get taken down, and you’d need to defend yourself to get it put back up.

  • AndrewCooper

    This was a fantastic article on SEO overall and in specific to Google. I’m especially thankful for you posting the link to the e-book guide from Google, that was a huge help for me and I’ll keep that on my shelf now in a printed format! =]

    Also about the URL Structure part, that was also helpful too because sometimes you just don’t know what you should be following, eh?

    Thanks for the rare, great article on SEO!

    Andrew Cooper

  • Anonymously

    To conclude, I’d like to point out that in the majority of cases, having duplicate content does not have negative effects on your site’s presence in the Google index. It simply gets filtered out.

    Would you please expand on your understand of this statement, since it is my understanding that Google’s PageRank system is a zero-sum game, in that an increase in the PR of one site is effectively offset by a tiny reduction in the PR of every other site.

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @mmj – by saying “good luck with this one” I just meant that the process is pretty complicated and time-consuming. :)

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @Andrew Cooper – I am glad you enjoyed the article and you found it informative, Andrew. :)

    @Anonymously – the Google PR has nothing to do with one site’s ranking in the SERPs/ presence in the index. JhonChow.com had a great PR a long time, without being present in the SERPs.

  • Anonymously

    You mean JohnChow.com, not JhonChow.com, right? Second, putting aside the term pagerank. If there for some strange reason were 10 pages for a given keyword all on one site and the duplicate ever single page. Are you telling that the duplicates would not effect the existing rankings of the original 10 pages? Further, to make the example even more clear, the keyword is:

    “site:example.com”

    Where the domain example.com is the site in question. Meaning that before all the original top ten links fit on the first page of search results. Based on what you are saying ALL 10 original pages would still be on the first page, and I find this very hard to believe. Meaning that there would be one or more duplicates on the first page.

    Thank you VERY much for your time, and hope that what I am asking is clear and relevant. Thank YOU!

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    I think the article was clear and relevant and what you are asking is already explained IN the article. Please note the beginning of the article:

    All questions in this article were asked by users on Google Moderator Beta, Ask a Google Engineer.

    – and ALL answers to the questions were given by Google engineers, obviously. I just put together these answers. If my authority means nothing for you, at least that of these experts should. Now to re-iterate what it is already in the article: all duplicates are omitted from the search results. This is what Google says. This is what the article says.

  • Anonymous

    The truth is duplicate content is Google’s little thorn. They don’t know how to deal with it and never will. This is why scraper blogs will always be successful. Think about the fact that news sites which use content from AP.org process duplicate content all day every day. Does Google penalize these newspapers? Of course not. Duplicate content through syndication is too big, and Google will never be able to fix it unless they hurt tons of authority sites along the way.

  • http://www.tordonfx.com tordon

    Google has all the rules in their terms and conditions that stir you away from actually getting your site SEO ready but if you follow it like they want you would never get any traffic.

  • Anonymously

    @ Mihaela Lica

    Wow… umm, yeah — taking a step back, deep breath.

    (1st) I never asked if your “article was clear and relevant,” I said, and I quote “hope that what I am asking is clear and relevant.”

    (2nd) I did read everything, but I guess what I was asking was not clear. Dropping the question, since at this point you are wasting my time, not the other way around.

    (3rd) You did not answer my very first question: “You mean JohnChow.com, not JhonChow.com, right?”

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @Anonymously 3rd – yes, that’s what I meant.

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @Anonymous – in the news they actually found way to deal with duplicates. They show clusters of related titles – 3 or 4 – then they make a mention like “all 74 news articles” under (where 74 can be replaced with any number of news). But in essence, I do agree with you: scrapers are a thorn, and not just for Google, but for the publishers.

  • http://tentonweb.com/ tentonjim

    @mmj re:DMCA
    Here in Jacksonville, FL, I had 3… 3 companies that had blatantly copied my content and either placed it or hidden it in their site(s). The worst offender had taken my entire homepage, changed out my name, and hidden it in a display:none h1 tag in 13 of their pages… and this outfit is on the BBB here in Jax.

    I followed the procedure, faxed to Google, and the reply from Google was pretty much “Sorry”. Talk about frustrated. I ended up hand delivering cease and desist notices to these people… which, in spite of the pain in my rear, I would have to say the looks on their faces was priceless. After some chit chat i explained who i was and why i was there.

    I am just a guy doing side work and these are “legit” web design and/or SEO companies here in Jacksonville.

    So yeah… what he said… “good luck with that.” I would have to agree.

    Jim S.
    http://tentonweb.com/
    Jacksonville, FL

  • spVince

    February 1, 2008 Matt Cutts from Google responds about duplicate content issues and guidlines:
    http://www.mattcutts.com/blog/duplicate-content-question/

  • http://www.ewriting.pamil-visions.com/ Mihaela Lica

    @spVince – that article only covers one side of the duplicate content issue – article publishing. Duplicate content refers to much more than that though. In this article I linked to more recent Google responses, and more in depth.

  • http://www.islandinfo.mu abhikerl

    thanks for the article. Anyway, I don’t trust google so much but we still need it for seo stuff. My site had a PR3 and was decrease to PR0 and after 1 week, I got back the PR3. Why? I haven’t figured it yet.

  • http://charlessweeney.com Charles Sweeney

    Just some clarification about paid links. Paid links are only a problem if you are transferring PageRank. This is from Google:

    (http://www.google.com/support/webmasters/bin/answer.py?answer=66736)

    “Not all paid links violate our guidelines. Buying and selling links is a normal part of the economy of the web when done for advertising purposes”

    After all, Google AdWords are paid links!

    @Mihaela. Your English is FANTASTIC! I bet the spelling policeman can only speak one language!

  • http://www.tynt.com TyntTracer

    Great article! You covered a lot of ground efficiently. But I have to say PR, SERP, SEO mean nothing when your content gets lifted and your site gets no hits even though you did all the work. All content developers deserve THE credit for THEIR work. Your unique content, SEO optimization and link backs to your site are what it all comes down to.

    Trevor at Tynt.com

  • Asthma Health

    Hmmm, I took the time to go through it all and have to agree that if one really follows what Google says here, their site will do well. Of course there will be one or two places where they missed the point, but Google is da boss and we should listen to them and adhere to what they recommend. That’s the best way to go instead of trying to fight and trick and manipulate them all the time. Thanks for sharing this. It proved helpful, indeed.

  • johnyboy

    what i’d like to know is: what happens when there’s duplicate content on the same domain, but that duplicate content has small variations. in particular what happens when someone searches for phrases which are the small varying bits? will they find it? to illustrate what i mean, x’s represent sentances/paragraphs of duplicate content, and a’s and b’s represent sentances of varying content.
    page1: xxxxxxaxxxxxxxxxx
    page2: xxxxxxxxxxxbxxxxx
    a person does a search for a, another for b. will both people find a and b on page1 and page2 respectively?

  • http://kipram.com spt2009

    I ever read from one blog, Google.com not concern about duplicate content