SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Member
    Join Date
    Oct 2012
    Posts
    20
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Unhappy Worried my site's not using redirects properly.

    I've been doing a lot of research on 301 (permanent) redirects, and use of the rel=canonical tag, and I'm a little concerned my site's not doing a good job, and hence susceptible to duplicate content problems.
    Check out this 301-redirect checker: http://www.ragepank.com/redirect-check/ and type in my site's URL: jobstr.com

    All 22 of those URLs are returning a 200 (OK) message, when (unless I'm mistaken), only ONE of them should, right? Is this a problem?
    But also weird is that of those 22 URLs, nearly all of them just link to "Page Not Found" pages on Jobstr, see: http://jobstr.com/index.htm

    If they're going to Page Not Found pages, why don't those redirect checkers return a 404 code???

  2. #2
    Twitter: @TimIgoe silver trophy TimIgoe's Avatar
    Join Date
    Feb 2005
    Location
    Blackpool, UK
    Posts
    1,058
    Mentioned
    27 Post(s)
    Tagged
    1 Thread(s)
    It looks like the CMS you are using, is returning a 200 and a 'page' for a 404, rathr than returning the actual 404 http status code.

    http://jobstr.com/timtest.html for example, and use a test tool like Firebug / Chrome's web dev tools to see traffic.

  3. #3
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    d66,

    The 404 response code is the server's response to a "file not found" error. However, if the CMS is redirecting to its own 404 script, then the server is finding that script (the CMS's index.php - surprise!) and serving it with the obligatory 200 response code. In other words, the CMS is handling the "file not found" error internally and the server is operating correctly.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  4. #4
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,097
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by dklynn View Post
    In other words, the CMS is handling the "file not found" error internally and the server is operating correctly.
    I disagree. If there is no content interesting to humans and/or search engines on a given URI (i.e., a custom 404) page, the server must also return a HTTP status 404 to reflect this. The 404 status response is the main way to indicate to search engines that the content that could (maybe) be found previously on a URI is not /no longer there. Without 404 it would just keep all sites it ever found in it's index, regardless if the content is still there or not (okay, there is a concept called "soft 404", which does use HTTP status 200, but I don't want to get in to that.)
    The fact there is a file to serve that content and that that file was found is completely irrelevant. So no, I don't think is the server is working correctly, it must send 404 not found headers on those pages.

    Also, as stated in the RFC, 404 is simply "not found", not "file not found", which also suggests it's not about finding files but about finding resources.
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  5. #5
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by dklynn View Post
    The 404 response code is the server's response to a "file not found" error. However, if the CMS is redirecting to its own 404 script, then the server is finding that script (the CMS's index.php - surprise!) and serving it with the obligatory 200 response code. In other words, the CMS is handling the "file not found" error internally and the server is operating correctly.
    The server may be operating correctly, according to the precise technical specification, but the system as a whole is not. If the CMS is handling 404s internally, such that requesting a page that doesn't exist returns a 200 A-OK, the CMS is not operating correctly. Any page on a live domain that does does not exist should return a 404 error. A key part of a search robot's role is to be able to identify where there are live pages, and if it gets a 200 A-OK for absolutely any old rubbish on a domain then it's harder to figure out which pages are still there and which are either dead or never existed in the first place. That's why robots sometimes request completely fictitious (and very unlikely) pages, to check if it can trust your site to correctly return a 404 error.

  6. #6
    SitePoint Member
    Join Date
    Oct 2012
    Posts
    20
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stevie D View Post
    The server may be operating correctly, according to the precise technical specification, but the system as a whole is not. If the CMS is handling 404s internally, such that requesting a page that doesn't exist returns a 200 A-OK, the CMS is not operating correctly. Any page on a live domain that does does not exist should return a 404 error. A key part of a search robot's role is to be able to identify where there are live pages, and if it gets a 200 A-OK for absolutely any old rubbish on a domain then it's harder to figure out which pages are still there and which are either dead or never existed in the first place. That's why robots sometimes request completely fictitious (and very unlikely) pages, to check if it can trust your site to correctly return a 404 error.
    Thanks for the replies. The above bolded part is particularly concerning...I wasn't sure if this problem I originally posted about was (a) just a weird server quirk, or (b) materially detrimental to my site's SE profile. I'd gotten vague suggestions that it could be hurting me in that respect, but not anything quantifiable...this suggestion that SE's will ping you with fictitious pages to check the integrity of your 404 code...how big a deal is that?

  7. #7
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by domino66 View Post
    Thanks for the replies. The above bolded part is particularly concerning...I wasn't sure if this problem I originally posted about was (a) just a weird server quirk, or (b) materially detrimental to my site's SE profile. I'd gotten vague suggestions that it could be hurting me in that respect, but not anything quantifiable...this suggestion that SE's will ping you with fictitious pages to check the integrity of your 404 code...how big a deal is that?
    It isn't a suggestion, it does definitely happen.

    The reason they do that is so that they know whether they can trust 200 A-OK responses to be genuine pages. If they can then that's all well and good, and they know that any page giving a 200 A-OK is really there and any dead links or expired pages will come up as a 404. On the other hand, if you have a site that returns a 200 A-OK for any URL, the search engine needs to put a bit more effort into making sure that those URLs are returning genuine pages and not pseudo-error pages.

    I don't know what effect that will have on your ranking, but it is unlikely to be good. The two possibilities are (i) that your pseudo-error page will make it into the search results (which of course wouldn't happen with a genuine 404), and I have definitely seen this happen, and (ii) that the extra checking needed to make sure each page is genuine would eat into the amount of crawl time that the search engine spends on your site, reducing the amount of time left for actually crawling and indexing the content.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •