SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Member
    Join Date
    Oct 2011
    Posts
    12
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Block search engines indexing a few pages

    Hi guys,
    I use SEOMoz. This is alerting me to some 'no meta data found' errors on a few pages. Its actually just one page, but it displays different data depending on whats coming out of the db:
    eg: mydomain.com/mypage/var-value-1, mydomain.com/mypage/var-value-2, mydomain.com/mypage/var-value-3 etc...

    Can I use robots.txt to block all search engines from listing all pages after mydomain.com/mypage/ ? ie some kind of wildcard command?

    I hope that makes sense. Please dont suggest I use a canonical tag. Its a long story, but because of the code thats on the page this is not an option.

    Thanks

  2. #2
    SitePoint Member
    Join Date
    Jun 2012
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For website's good promotion, sitemap is very necessary. you will do to allow or disallow search engine with help of sitemap.
    Last edited by Stevie D; Sep 5, 2012 at 04:30. Reason: Fake signature deleted

  3. #3
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by yjones View Post
    I use SEOMoz. This is alerting me to some 'no meta data found' errors on a few pages. Its actually just one page, but it displays different data depending on whats coming out of the db:
    eg: mydomain.com/mypage/var-value-1, mydomain.com/mypage/var-value-2, mydomain.com/mypage/var-value-3 etc...

    Can I use robots.txt to block all search engines from listing all pages after mydomain.com/mypage/ ? ie some kind of wildcard command?

    I hope that makes sense. Please dont suggest I use a canonical tag. Its a long story, but because of the code thats on the page this is not an option.
    You can use robots.txt to block search engine spiders from accessing parts of your site*, if that's what you want to do:
    Code:
    user-agent: *
    disallow: /mypage/
    but that asks robots not to look at anything in the "mypage" folder, which would include the index page, so that probably wouldn't work for you.

    Another option, which might or might not work for you, would be to create a rewrite regex to redirect mydomain.com/mypage/var-value-* to mydomain.com/mypage/ - it depends whether the DB needs that extra parameter to generate the page or if it's just an artefact.

    * OK, technically it's "suggest that the might not want to look there", but most search robots are well-behaved and do what they're asked.

  4. #4
    SitePoint Member
    Join Date
    Oct 2011
    Posts
    12
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Stevie D - 2 possible solutions there. I'll probably go with the robots.txt one first.

  5. #5
    Non-Member
    Join Date
    Sep 2012
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you're using wordpress then Robots-META plugin will enable you to do this easily - you can set both the indexing and follow tags on a post by post basis.

    You can also selectively noindex posts if you're using the Thesis theme.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •