SitePoint Sponsor

User Tag List

Results 1 to 25 of 25
  1. #1
    SitePoint Guru
    Join Date
    Sep 2003
    Location
    Northern California
    Posts
    605
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Can a non-linked web page be indexed?

    If a page on a website is not linked to or from any other page on the site, can it be indexed? Hopefully, the answer is no.

  2. #2
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,423
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Yes if the search engines know it exists. If you don't want it indexed either use robots.txt to block it or use http headers to prevent it from being indexed.

  3. #3
    SitePoint Wizard bronze trophy bigalreturns's Avatar
    Join Date
    Mar 2006
    Location
    The Wirral, England
    Posts
    1,293
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yep - Google have some mysterious ways of finding orphan pages! I've seen it when uploading single test pages, I can only assume it's either because I have the toolbar installed, but there may be other ways I haven't thought of.
    "The proper function of man is to live - not to exist."
    Get a Free TomTom


  4. #4
    Non-Member
    Join Date
    Feb 2008
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by stymiee View Post
    Yes if the search engines know it exists. If you don't want it indexed either use robots.txt to block it or use http headers to prevent it from being indexed.

    I agree I think this is the best thing that you can do..

  5. #5
    SitePoint Enthusiast Chapichupapa's Avatar
    Join Date
    Feb 2008
    Posts
    63
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yes your site could be index as long as its already in the web, spiders or robots can crawl your site even if you don't like to index your site. Unless you put a robots.txt to avoid robots to index your site.

  6. #6
    SitePoint Enthusiast
    Join Date
    Dec 2004
    Location
    India
    Posts
    52
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i dont agree with other comments.. i think your page must have some inbound link..or u should tell google spider to crawl your page (by google sitemap or by submitting through addurl page)

  7. #7
    SitePoint Addict learnerseo's Avatar
    Join Date
    Feb 2008
    Posts
    345
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In my opinion, if the page is having no way of getting to it...Google will also not find it...provided it is not a new page.

  8. #8
    Error 404: Life not found silver trophybronze trophy
    Join Date
    Dec 2007
    Location
    UK Nr Manchester
    Posts
    3,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by bigalreturns View Post
    Yep - Google have some mysterious ways of finding orphan pages! I've seen it when uploading single test pages, I can only assume it's either because I have the toolbar installed, but there may be other ways I haven't thought of.
    I'd like to know what theories you've come up with? How does Google find a page that it doesn't know about and isn't linked to from anywhere?
    It's 530 people, but do you really get it?
    ImgWebDesign - Web design in Buxton, High Peak, Derbyshire UK.

  9. #9
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,423
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by JJMcClure View Post
    I'd like to know what theories you've come up with? How does Google find a page that it doesn't know about and isn't linked to from anywhere?
    In his post he mentioned that he suspected the Google toolbar may be the culprit.

  10. #10
    Error 404: Life not found silver trophybronze trophy
    Join Date
    Dec 2007
    Location
    UK Nr Manchester
    Posts
    3,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by stymiee View Post
    In his post he mentioned that he suspected the Google toolbar may be the culprit.
    Yes I saw that Stymiee, I'm not some dope from from a non-english speaking country who doesn't read posts properly. If I'm asking for theories you can assume that I mean above and beyond what was stated in the post.

    It would be nice if you gave me the benefit of the doubt every now and then.

    It's 530 people, but do you really get it?
    ImgWebDesign - Web design in Buxton, High Peak, Derbyshire UK.

  11. #11
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,423
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by JJMcClure View Post
    Yes I saw that Stymiee, I'm not some dope from from a non-english speaking country who doesn't read posts properly. If I'm asking for theories you can assume that I mean above and beyond what was stated in the post.

    It would be nice if you gave me the benefit of the doubt every now and then.

    I just assumed you overlooked it. It happens to everyone from time-to-time.

  12. #12
    Error 404: Life not found silver trophybronze trophy
    Join Date
    Dec 2007
    Location
    UK Nr Manchester
    Posts
    3,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    Quote Originally Posted by stymiee View Post
    I just assumed you overlooked it. It happens to everyone from time-to-time.
    It's smack in the middle of a two line post. I'd have to be pretty dumb and unconcerned about other people's views to have overlooked it and it's that assumption that bothers me.

    Still, I appreciate the sentiment though, so thanks.
    It's 530 people, but do you really get it?
    ImgWebDesign - Web design in Buxton, High Peak, Derbyshire UK.

  13. #13
    SitePoint Member
    Join Date
    Mar 2008
    Posts
    8
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you don't want it indexed either use robots.txt to block it or use http headers to prevent it from being indexed.
    Unfortunately some search engines (also google) don't respect the robots.txt. Many pages i blocked with that are still in the index, so don't trust the robots.txt


    If a page on a website is not linked to or from any other page on the site, can it be indexed?
    No! only the index page can be found unless a external or internal link show to your site or you submit a site map to google. The answer is clear: the spider follows the links crawl everything and go to the next link... so no link no index...

  14. #14
    SitePoint Wizard bronze trophy hooperman's Avatar
    Join Date
    Jan 2006
    Location
    Manchester, UK
    Posts
    4,301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by mds View Post
    The answer is clear: the spider follows the links crawl everything and go to the next link... so no link no index...
    Then how do you explain post 2?

  15. #15
    SitePoint Member
    Join Date
    Mar 2008
    Posts
    8
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    good question

    theory 1: he used the index.html (php) to test hes projects
    theory 2 : he used a already from the old times indexed url and overwrote it with a new one.
    theory 3 : google have night server googels and can see behind the firewalls
    theory 4: guuuugle analytics inside the code
    theory 5: sendet the link via gmail to the client and gmail is spyware...
    hmm more theory's i dont have right now.. but would be interesting what other people think...


    Iam testing all client and personal pages on different servers and (high PR) Domains. Google bots are visiting all the time the page, but none of them ever grabed any of my test pages... that's just my experience...

    I think the thing with gougle toolbar send spiders to pages you visit is a myth. Imagine you are in a psw protected page logged in.... now guuugle sends there spiders and keep all the data on there server... guuuugle would have millions of pages in the serps which are password protected and therefore useless for the user ... may somebody have another theory... i belive in theory 3...

  16. #16
    Error 404: Life not found silver trophybronze trophy
    Join Date
    Dec 2007
    Location
    UK Nr Manchester
    Posts
    3,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by mds View Post
    good question

    theory 1: he used the index.html (php) to test hes projects
    theory 2 : he used a already from the old times indexed url and overwrote it with a new one.
    theory 3 : google have night server googels and can see behind the firewalls
    theory 4: guuuugle analytics inside the code
    theory 5: sendet the link via gmail to the client and gmail is spyware...
    hmm more theory's i dont have right now.. but would be interesting what other people think...


    Iam testing all client and personal pages on different servers and (high PR) Domains. Google bots are visiting all the time the page, but none of them ever grabed any of my test pages... that's just my experience...

    I think the thing with gougle toolbar send spiders to pages you visit is a myth. Imagine you are in a psw protected page logged in.... now guuugle sends there spiders and keep all the data on there server... guuuugle would have millions of pages in the serps which are password protected and therefore useless for the user ... may somebody have another theory... i belive in theory 3...
    Good stuff. That's the kind of thing I was after.
    It's 530 people, but do you really get it?
    ImgWebDesign - Web design in Buxton, High Peak, Derbyshire UK.

  17. #17
    SitePoint Wizard bronze trophy hooperman's Avatar
    Join Date
    Jan 2006
    Location
    Manchester, UK
    Posts
    4,301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Interesting theories! I think some are possible (not sure about plausible). We'll have to wait and see whether bigal did any of the things in those points.

  18. #18
    Non-Member vickyseo's Avatar
    Join Date
    Feb 2008
    Location
    Noida
    Posts
    150
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yes it's indexed because spiders can be crawl your site even if you don't like to index your site.

  19. #19
    O Rly?? JakeJeck's Avatar
    Join Date
    Nov 2000
    Location
    Milwaukee
    Posts
    571
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by mds View Post
    good question

    theory 1: he used the index.html (php) to test hes projects
    theory 2 : he used a already from the old times indexed url and overwrote it with a new one.
    theory 3 : google have night server googels and can see behind the firewalls
    theory 4: guuuugle analytics inside the code
    theory 5: sendet the link via gmail to the client and gmail is spyware...
    hmm more theory's i dont have right now.. but would be interesting what other people think...


    Iam testing all client and personal pages on different servers and (high PR) Domains. Google bots are visiting all the time the page, but none of them ever grabed any of my test pages... that's just my experience...

    I think the thing with gougle toolbar send spiders to pages you visit is a myth. Imagine you are in a psw protected page logged in.... now guuugle sends there spiders and keep all the data on there server... guuuugle would have millions of pages in the serps which are password protected and therefore useless for the user ... may somebody have another theory... i belive in theory 3...
    If you have a page that is PW protected, google would get sent to a login page and never see the content.

  20. #20
    SitePoint Zealot phppoddotcom77's Avatar
    Join Date
    Feb 2008
    Posts
    155
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The web page cannot be indexed.

  21. #21
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,423
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by mds View Post
    Unfortunately some search engines (also google) don't respect the robots.txt. Many pages i blocked with that are still in the index, so don't trust the robots.txt
    All of the major search engines respect robots.txt. If you claim they don't you need to offer us some solid proof of that.

    Quote Originally Posted by mds View Post
    No! only the index page can be found unless a external or internal link show to your site or you submit a site map to google. The answer is clear: the spider follows the links crawl everything and go to the next link... so no link no index...
    That's not true. A link isn't the only way they can find a page. It's just the most obvious.

  22. #22
    SitePoint Enthusiast
    Join Date
    Mar 2008
    Posts
    53
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Unfortunately some search engines (also google) don't respect the robots.txt. Many pages i blocked with that are still in the index, so don't trust the robots.txt
    Robots.txt doesn't tell Google to keep pages out of the index. Disallowed urls still accumulate PageRank, and if enough sites link to a disallowed URL, it will show up in search results. Even if you block a URL with META ROBOTS=noindex, the URL will still accumulate PageRank, even though it will not show up in search. However, if you robots.txt disallow a META ROBOTS=noindex page, it will show up in search results because Google can't read the META tag (robots.txt disallow tells Googlebot don't crawl that URL).

  23. #23
    SitePoint Wizard bronze trophy bigalreturns's Avatar
    Join Date
    Mar 2006
    Location
    The Wirral, England
    Posts
    1,293
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by mds View Post
    theory 1: he used the index.html (php) to test hes projects
    theory 2 : he used a already from the old times indexed url and overwrote it with a new one.
    theory 3 : google have night server googels and can see behind the firewalls
    theory 4: guuuugle analytics inside the code
    theory 5: sendet the link via gmail to the client and gmail is spyware...
    hmm more theory's i dont have right now.. but would be interesting what other people think...
    Quote Originally Posted by hooperman View Post
    Interesting theories! I think some are possible (not sure about plausible). We'll have to wait and see whether bigal did any of the things in those points.
    1) Yes, but it was on a freshly bought domain - it is possible it had been owned before and indexed then though.
    2) Nope
    3) Don't understand what you mean
    4) Nope
    5) Nope

    Another possible theory, which applies in one case, but not some others, is referrer data. One of my sites displays some Google search results in a framed page, so they probably crawl sites that send them visitors.
    I've also seen it happen with fresh domains - possibly Google use their registrar status so they can check out all newly registered domains.
    "The proper function of man is to live - not to exist."
    Get a Free TomTom


  24. #24
    O Rly?? JakeJeck's Avatar
    Join Date
    Nov 2000
    Location
    Milwaukee
    Posts
    571
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tend to believe it's through registrar data. At work we have a private domain we use for testing and google started indexing the site after a week. Only 2 people know the URL but we both have the toolbar with pagerank active.

    However the reason I don't believe the toolbar is the culprit is because otherwise we'd see many "in progress" pages on domains show up in the search results. These are pages/sections that are being worked on that exist on an actively crawled domain but currently have no inbound links.

  25. #25
    SitePoint Enthusiast
    Join Date
    May 2007
    Posts
    61
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, If you have installed any tool bar [google, yahoo & alexa etc.]


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •