Keeping Control of Content

Is there a way to prevent search engines (e.g. Google) from caching my website content?

Or does that even make sense?!

Obviously I want a good SEO rating, but I also worry about people stealing my content.

There should be some way that I can reel in my content and get it off the Internet and not have it floating around until the end of time. (I believe this has become a major issue for sites like “The NY Times” in the past 4-5 years.)

Initially, I want my content out there and people to see it, but ideally I should always have control over it.

Is that possible??

If so, how do I do it?

Debbie

If you want to be listed in Google’s search results, then you have to let Google crawl and cache your pages.

If you remove your webpage, it’ll eventually drop out of Google’s index too.

If you don’t want Google to crawl and cache a webpage at all, add it to your robots.txt file.

you can use noindex, nofollow in your meta tags.

Or you could setup a robot.txt file and deny certain directories.

Unfortunately there is no full proof way to keep people from stealing content.

One way is to have people register to view your content. That way you know who is viewing what at what time.

But a lot of online content providers like the NY Times had issues with this theory not working. (I believe they sued Google and other search engines because they were caching or somehow indefinitely storing content that the NY Times no longer wanted public.)

Obviously I am no where near that level, but small problems always find ways of becoming gigantic problems over time!!

Debbie

What do those do?

Would I need to put them in each web page?

Or you could setup a robot.txt file and deny certain directories.

How does that differ from your first advice?

Unfortunately there is no full proof way to keep people from stealing content.

One way is to have people register to view your content. That way you know who is viewing what at what time.

Well, that is definitely one way I am thinking of handling the issue.

Debbie

All good sites will have a robots.txt file (for several dozen reasons) and they are typically easier to implement regarding directories: http://en.wikipedia.org/wiki/Robots_exclusion_standard

With the META approach you use then on a per page basis to stop ‘indexing’ and ‘following’ of links.

In the end, if you do not want your content to be in the hands of others. DO NOT put it online. That is all there is to it. There is nothing else for you to do, you cannot stop it. So it is better not to try, or not to make it available to begin with.

no

The closest solution that would suit your problem DoubleDee would be

<meta name="robots" content="noindex, follow">

This would allow the Googlebot to move through your website but by specifying noindex would limit how it is presented…

I wouldn’t worry about having your content copied… perhaps it’s a little presumptuous of you to presume that it is of a quality that makes it attractive to plagiarisers