SitePoint Sponsor

User Tag List

Results 1 to 11 of 11

Hybrid View

  1. #1
    SitePoint Addict revlimiter's Avatar
    Join Date
    Sep 2005
    Location
    British Columbia, Canada
    Posts
    275
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Search Engine Indexing

    Hi everyone,
    Our website has 104,000 indexed pages on Google however a good chunk of these indexed pages are empty breadcrumb URL's with no content on the pages. There is no 404 error or redirect on these pages, just our HTML template without any visible content. Also, when we remove an item from our website it stays on Google and directs to the empty page. I believe this is where the root of the problem occurs. Should we be telling the SERPs to also de-index our pages when we remove items from our website and if so, how can this be done sooner rather than later?

    We are using a custom built CMS from scratch. I am not a programmer but I can pass on the information to our programmer who can perhaps put some more work into the breadcrumbs or robots.txt file if that's what is required to resolve this.

    With these empty pages removed from Google's index we are hoping that the overall SEO will improve on our website too.
    Thanks,
    "To make an apple pie from scratch,
    you must first create the universe.
    -Carl Sagan

  2. #2
    SitePoint Member
    Join Date
    Oct 2008
    Location
    Essex, United Kingdom
    Posts
    11
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I know that you can remove pages from google using Google webmaster tools. You could also use a robots.txt file to not allow search engines to index certain parts or pages of your site.
    HostingTool - Honest Web Hosting
    █ Shared and Reseller cPanel Web Hosting. Guaranteed Resources.
    http://www.hostingtool.net

  3. #3
    SitePoint Addict revlimiter's Avatar
    Join Date
    Sep 2005
    Location
    British Columbia, Canada
    Posts
    275
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by HostingPlace View Post
    I know that you can remove pages from google using Google webmaster tools. You could also use a robots.txt file to not allow search engines to index certain parts or pages of your site.
    Is there a way to automate these Google submissions? I am noticing forums out there become indexed within minutes after posting a new thread. When that thread is removed it is also removed from Google pretty quickly too it seems. Is this only because the forum is a high traffic website and Google puts more priority on real time indexing for that website?

    I will go about the manual Google Webmaster method for now but hoping to get it automated in the future for instances like these.
    "To make an apple pie from scratch,
    you must first create the universe.
    -Carl Sagan

  4. #4
    SitePoint Member
    Join Date
    Dec 2012
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by HostingPlace View Post
    I know that you can remove pages from google using Google webmaster tools. You could also use a robots.txt file to not allow search engines to index certain parts or pages of your site.
    What are those tools I am not familiar ...Could you please further explain?

  5. #5
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by peter50 View Post
    What are those tools I am not familiar ...Could you please further explain?
    robots.txt
    Google Webmaster Tools
    The answers are pretty easy to find if you try!

  6. #6
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    The best plan is to try to reprogram the CMS so that requests for non-existent pages get met with a 404 error, and that pages that get removed are given a redirect to a suitable alternative, or are marked as 'gone', because as long as they are giving 200 A-OK Google will continue to index them.

  7. #7
    SitePoint Addict revlimiter's Avatar
    Join Date
    Sep 2005
    Location
    British Columbia, Canada
    Posts
    275
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stevie D View Post
    The best plan is to try to reprogram the CMS so that requests for non-existent pages get met with a 404 error, and that pages that get removed are given a redirect to a suitable alternative, or are marked as 'gone', because as long as they are giving 200 A-OK Google will continue to index them.
    Thanks -
    What is the most effective way to mark pages as 'gone'? Just a simple 404 page, a robots.txt file, something in the <head> tag, or another way? Actually come to think of it each page is generated through PHP so a file directory will not exist for multiple robots.txt files.
    "To make an apple pie from scratch,
    you must first create the universe.
    -Carl Sagan

  8. #8
    Community Advisor ULTiMATE's Avatar
    Join Date
    Aug 2003
    Location
    Bristol, United Kingdom
    Posts
    2,160
    Mentioned
    46 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by revlimiter View Post
    Thanks -
    What is the most effective way to mark pages as 'gone'? Just a simple 404 page, a robots.txt file, something in the <head> tag, or another way? Actually come to think of it each page is generated through PHP so a file directory will not exist for multiple robots.txt files.
    From a search engine perspective, they all recognise 404 errors, so you should make sure that your server is sending a 404 error in the HTTP header.

    However, from a user perspective you'll want something a bit more long-term, whether it be 301/302 redirects or a dynamic 404 page that hints towards similar pages.

  9. #9
    SitePoint Evangelist
    Join Date
    Jun 2011
    Location
    London UK
    Posts
    495
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by revlimiter View Post
    Hi everyone,
    Our website has 104,000 indexed pages on Google however a good chunk of these indexed pages are empty breadcrumb URL's with no content on the pages. There is no 404 error or redirect on these pages, just our HTML template without any visible content.....
    Google won't like that one little bit, and I won't be surprised if sooner or later that leads to a really low serp ranking.
    I would think belt and braces approach with a block in your robots file for all pages with little or not content, as well as redirects would be your best option. On the double wouldn't be a bad idea.

  10. #10
    SitePoint Member
    Join Date
    Dec 2012
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Set permission of all your CMS files to no-follow and then count the index pages.. this would be the original number of indexed pages for your site....

  11. #11
    SitePoint Member ARGUE thAt's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Nowadays Google is moving towards Manual Links. As if there are some empty pages, it may also see those as a Keyword stuffing. First of all make those links nill and then remove those pages because if they redirects their than its also harmful for you. I must say you have very good website as you said lots of pages are indexed so you need to do this ASAP.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •