SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    Web Enthusiast
    Join Date
    Jul 2000
    Location
    Western Massachusetts, USA
    Posts
    1,383
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Web crawl errors

    I am getting a huge number of crawl errors from Google Webmaster tools, for example,

    Furthermore, my favorite link checker Link Sleuth, only crawls the first page.

    The site seems to have generated errors from the beginning.

    Here is my .htaccess file:

    Code:
    #suPHP_ConfigPath /home/berkshi1
    DirectoryIndex index.php
    
    #to redirect urls from old to new urls
    Redirect 301 /pages/services.htm http://berkshiredentist.com/patient-services/
    Redirect 301 /pages/home.htm http://berkshiredentist.com/
    Redirect 301 /pages/library.htm http://berkshiredentist.com/patient-library/
    Redirect 301 /pages/contact.htm http://berkshiredentist.com/contact-us/
    Redirect 301 /pages/about.htm http://berkshiredentist.com/about-us/
    Redirect 301 /pages/links.htm http://berkshiredentist.com/61-2/
    Redirect 301 /pages/gallery.htm http://berkshiredentist.com/photo-gallery/
    Redirect 301 /pages/forms.htm http://berkshiredentist.com/new-patient-forms/
    Redirect 301 /pages/patient_library/veneers.htm http://berkshiredentist.com/porcelain-veneers/
    Redirect 301 /pages/patient_library/gumdisease.htm http://berkshiredentist.com/gum-disease-serious-but-treatable/
    Redirect 301 /pages/patient_library/diagnodent.htm http://berkshiredentist.com/diagnodent-laser-reflection-finds-imperfection-aids-in-correction/
    Redirect 301 /pages/patient_library/airabrasion.htm http://berkshiredentist.com/patient-library/
    Redirect 301 /pages/patient_library/implants.htm http://berkshiredentist.com/implants-permanent-secure-tooth-replacements/
    Redirect 301 /pages/patient_library/digitalxray.htm http://berkshiredentist.com/digital-x-rays/
    Redirect 301 /pages/special.htm http://berkshiredentist.com/
    Redirect 301 /pages/services.htm http://berkshiredentist.com/
    Redirect 301 /pages/faqs.htm http://berkshiredentist.com/
    
    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    # END WordPress
    What am I missing?
    Paul C.
    ClickBasics
    http://www.clickbasics.com

  2. #2
    SitePoint Wizard bronze trophy C. Ankerstjerne's Avatar
    Join Date
    Jan 2004
    Location
    The Kingdom of Denmark
    Posts
    2,702
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    The links in your source looks fine (although your source code does look a bit crowded). Which type of crawl errors are you getting?
    Christian Ankerstjerne
    <p<strong<abbr/HTML/ 4 teh win</>
    <>In Soviet Russia, website codes you!

  3. #3
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,881
    Mentioned
    122 Post(s)
    Tagged
    1 Thread(s)
    Jesus wept ... and that was before he looked at the source code of a Wordpress page.

    For some reason, there's a bunch of Javascript that seems to be screwing up your outbound links:
    Code:
    <a title="About Cosmetic Dentistry" href="http://www.aboutcosmeticdentistry.com/" onclick="javascript:_gaq.push(['_trackPageview','/yoast-ga/outbound-article/www.aboutcosmeticdentistry.com']);">About Cosmetic Dentistry</a>
    It looks like bots are trying to follow the link in the script snippet as well as, or instead of, the actual link destination. As that link is only there to track what people have clicked on, and doesn't actually resolve to the correct page but instead to a 404 error, it looks like that's why Googlebot is having trouble.

  4. #4
    Web Enthusiast
    Join Date
    Jul 2000
    Location
    Western Massachusetts, USA
    Posts
    1,383
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Very interesting about the Javascript code. It is not in the content HTML.

    HTML Code:
    <strong><a title="About Cosmetic Dentistry" href="http://www.aboutcosmeticdentistry.com/">About Cosmetic Dentistry</a></strong> - Learn about procedures, view photos and find out how you can beautify your smile with cosmetic dentistry.
    Turns out the Wordpress Google Analytics program was set to track outbound links. I disabled that, and now the Javascript is gone in the source view.

    Many thanks for pointing me in the right direction.
    Paul C.
    ClickBasics
    http://www.clickbasics.com

  5. #5
    SitePoint Member
    Join Date
    Oct 2012
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How do you find the Wordpress Google Analytics program setting to disable it? I am having the same problem.

  6. #6
    Web Enthusiast
    Join Date
    Jul 2000
    Location
    Western Massachusetts, USA
    Posts
    1,383
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Frankly, it's been so long, I don't remember. I googled on
    google analytics event tracking outbound links
    and found a whole bunch of articles including this one.

    https://developers.google.com/analyt...ntTrackerGuide

    Hope that helps.
    Paul C.
    ClickBasics
    http://www.clickbasics.com

  7. #7
    Web Enthusiast
    Join Date
    Jul 2000
    Location
    Western Massachusetts, USA
    Posts
    1,383
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    A Ha! I remember now. The setting is in the WordPress plugin "Google Analytics for WordPress" configuration. See attached screen shot.tracking_outbound_links.jpg
    Paul C.
    ClickBasics
    http://www.clickbasics.com

  8. #8
    om nom nom nom Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,271
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    We had an issue that also added a redirect between where the user clicked and the end destination, caused by some google tracking stuff. The client noticed it when clicking on links lead to blank pages (it seems the intermediate google server was down for an hour or so and the client (understandably) freaked out).

    BTW wonder if you can remove this in your htaccess file:
    <IfModule mod_rewrite.c>

    If you know you have mod_rewrite, get rid of that. It's a silly-expensive check that I believe happens at *every* request. For some reason it seems default with some hosting setups as a safety feature, but it's only meant as safety when you're first setting up and might not have mod_rewrite.

  9. #9
    SitePoint Member
    Join Date
    Sep 2013
    Posts
    9
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    my be the link is broken. try to fix the link.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •