SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    Sploghm bronze trophy Victorinox's Avatar
    Join Date
    Nov 2008
    Posts
    749
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)

    Only PDFs listed in search

    I've recently been asked to take over and redevelop a site. Curiously, the only pages Google has indexed of the old site are PDF versions of the content (apparently generated by a Joomla widget). Bing lists nothing at all.

    There's nothing in the robots.txt or any meta tags that would have caused this, and no analytics code in place. As the site is being started from scratch I have no access to anything in the former CMS or .htaccess.

    Some sample URLs, stripped of domain.

    Page: index.php?option=com_content&task=view&id=17&Itemid=31
    PDF: index2.php?option=com_content&do_pdf=1&id=17

    Any ideas what may have caused this, and whether it might affect indexing in future? The new site will have a different structure and friendlier URLs.

  2. #2
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    I'm curious ... are the old pages still there and live ... and Google isn't finding any of them at all? I could understand if it was prioritising the PDFs over Joomla-based pages (heck, I would prioritise what I scrape off the bottom of my shoe over most Joomla-based pages, from the cruft:content ratio alone), but to only index PDFs that I assume are only/mostly linked from the HTML pages but not to have any trace of the HTML pages themselves is distinctly odd. It sounds like there should be something in the robots.txt or <meta> tags that is blocking indexing, but you say that's not the case? What about canonical tags?

  3. #3
    Sploghm bronze trophy Victorinox's Avatar
    Join Date
    Nov 2008
    Posts
    749
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Yes, the pages were live at time of searching (though not for much longer as I've just switched nameservers).

    No canonicals, but just noticed the pages all have a "verify-v1" meta. Might this, in the absence of any Google Analytics scripts, have something to do with it?

  4. #4
    Life is not a malfunction gold trophysilver trophybronze trophy
    TechnoBear's Avatar
    Join Date
    Jun 2011
    Location
    Argyll, Scotland
    Posts
    6,408
    Mentioned
    273 Post(s)
    Tagged
    5 Thread(s)
    Just out of curiosity, what does a site:domain.com search produce? (Quickly now, before the nameserver change takes effect. )

  5. #5
    Sploghm bronze trophy Victorinox's Avatar
    Join Date
    Nov 2008
    Posts
    749
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TechnoBear View Post
    Just out of curiosity, what does a site:domain.com search produce? (Quickly now, before the nameserver change takes effect. )
    My findings were based on a site:domain search.

    Phrase searches also return only the PDFs, and a phrase search for the Home page's long, unique, page title element that's not present in the body returns nothing.

  6. #6
    SitePoint Evangelist
    Join Date
    Jun 2011
    Location
    London UK
    Posts
    495
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    It could be sandboxed due to originality issues or something along those lines.
    Have you looked in Google Webmaster Tools/Analytics?
    Try a fetch and see if any problems come up.

  7. #7
    Sploghm bronze trophy Victorinox's Avatar
    Join Date
    Nov 2008
    Posts
    749
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by benbob View Post
    It could be sandboxed due to originality issues or something along those lines.
    Have you looked in Google Webmaster Tools/Analytics?
    Try a fetch and see if any problems come up.
    Despite the verify tag, the owner isn't sure whether Google Analytics stats were ever gathered and isn't keen on querying the former developer. He thinks some SEO was done, and the old site did have descriptions and keywords. The latter a bit clumsy perhaps - 50 words including phrase permutations, mis-spellings, terms not in the content etc. - but not obviously overstuffed.

    Due to the former site having been based around an abandoned business model, the site currently has only an "under development" page.

    Are there any pros or cons in registering the domain with Webmaster Tools/GA before the new site is completed?

  8. #8
    SitePoint Evangelist
    Join Date
    Jun 2011
    Location
    London UK
    Posts
    495
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    I'm not aware of any downsites using WT/GA at any time.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •