SitePoint Sponsor

User Tag List

Results 1 to 19 of 19
  1. #1
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    PHP For HTTPS Problem

    Sitepoint Members,
    Search engines view http and http versions of your site as two different websites, creating duplicate content of your site which lowers your search engine ranking. I found that my site appears to google as having duplicate content as https pages. Why, I don't know becayse i don't have ssl installed on my account.

    The most often written about way to deal with this is to serve a different robots.txt for HTTPS
    http://blog.leonardchallis.com/seo/s...txt-for-https/
    http://www.seoworkers.com/seo-articl...and-https.html
    http://www.seosandwitch.com/2012/08/...hat-to-do.html

    Another site said to use canonical links on every preffered page
    http://www.creare.co.uk/http-vs-https-duplicate-content

    aaand the same site also gave this php code
    <?php
    if (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on') {
    echo '<meta name="robots" content="noindex,follow" />'. "\n";
    }
    ?>

    I guess it goes in the head just as the meta tags for no index no follow do.

    Is there anything I should worry about with this code?

    Thanks,

    Chris

  2. #2
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I put that code in. In vew source nothng of the code shows. Shoud it show?

  3. #3
    SitePoint Mentor silver trophy
    Rubble's Avatar
    Join Date
    Dec 2005
    Location
    Cambridge, England
    Posts
    2,435
    Mentioned
    82 Post(s)
    Tagged
    3 Thread(s)
    It should only show if:
    Code:
    isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on'
    Otherwise nothing will be displayed and you must assume one or the other is off or both are not on.

  4. #4
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Are you saying code should show in view source if HTTPS is turned on, if https is not turned on nothing will show in view source and if ... then I was completely lost with "one or the other is off". What are the two things that can be off, https and what else?

  5. #5
    SitePoint Mentor silver trophy
    Rubble's Avatar
    Join Date
    Dec 2005
    Location
    Cambridge, England
    Posts
    2,435
    Mentioned
    82 Post(s)
    Tagged
    3 Thread(s)
    Yes both need to be on to display the code otherwise nothing is displayed

  6. #6
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    What do you mean by your use of "both"? Secure http (httpS) and regular http?

  7. #7
    SitePoint Zealot bronze trophy xMog's Avatar
    Join Date
    Mar 2011
    Posts
    151
    Mentioned
    3 Post(s)
    Tagged
    2 Thread(s)
    The code you posted will display this in your page:
    Code:
    <meta name="robots" content="noindex,follow" />
    If the URL is HTTPS.
    So, if you go to your site in HTTPS moe: https://www.yoursite.com
    and you check the source, you should see this:
    Code:
    <meta name="robots" content="noindex,follow" />
    If you browse to your site without HTTPS : http://www.yoursite.com
    The code shouldn't appear.

    It tells Google to "not index the current page" and to "not follow the links inside the page".
    This should do the trick, it will just take a couple of days / weeks for Google to remove the pages from its indexes.

  8. #8
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I see. I don't think the PHP will work because I don't have SSL installed.
    What I have in the htaccess is
    #Options +FollowSymLinks
    Options +SymLinksIfOwnerMatch
    RewriteCond %{SERVER_PORT} 443
    RewriteRule ^(.*)$ http://mysite\.com/404.html [R=301,L]
    RewriteCond %{SERVER_PORT} ^443$
    RewriteRule ^robots\.txt$ robots_ssl\.txt [L]

    but this doesn't seem to work for my site becuase for months now google has had 20 or more https pages lister for my non https site.

  9. #9
    SitePoint Zealot bronze trophy xMog's Avatar
    Join Date
    Mar 2011
    Posts
    151
    Mentioned
    3 Post(s)
    Tagged
    2 Thread(s)
    When you go to your site with HTTPS in the URL with your browser, does it work or not?

    I'm really not an expert in .htaccess, but it seems that it would redirect all HTTPs traffic to your 404 page. Is that it? With a 301 error. Which is.. not that good I guess?
    Who is hosting your site? Can't you ask them to disable the HTTPs? ..

  10. #10
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    When I type in https for a page for my site I get a "This Connection is Untrusted" from firefox. If I choose "I Understand The Risks" it takes me to the full a non https address, meaning that the site is not the main site of my account (but is is the largest by a 100 fold), takes me to http://thesiteimworkingonnow.mymainsiteofmyaccount.com.

    The code
    RewriteCond %{SERVER_PORT} 443
    RewriteRule ^(.*)$ http://mysite\.com/404.html [R=301,L]

    was written by my webhost

    To me it means anything coming through the 443 port (443 is for SSL/https) send to a 404 page

    The rest of the code

    RewriteCond %{SERVER_PORT} ^443$
    RewriteRule ^robots\.txt$ robots_ssl\.txt [L]

    was also written by my webhost.

    To me it means everything else (e.g. coming through port 80) send to that second robots.txt page which is

    User-agent: *
    Disallow: /

    Which is how its done on this page
    http://www.seosandwitch.com/2012/08/...hat-to-do.html

    but why would I want to send robots coming through port 80 to
    User-agent: *
    Disallow: /
    (disallow all robots)

    Wouldn't that stop all robots coming through non httpS ports (^443), such as port 80, from crawling my site.

    All I'm trying to do is stop robots from coming through the httpS port because I don't want them reporting that I have httpS pages. If they can't get through the 443 port they have nothing to report.

  11. #11
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Forget Above, I Was Cut Off By The 30 Minute Limit

    When I type in https for a page for my site I get a "This Connection is Untrusted" from firefox. If I choose "I Understand The Risks" it takes me to the full a non https address, meaning that the site is not the main site of my account (but is is the largest by a 100 fold), takes me to http://thesiteimworkingonnow.mymainsiteofmyaccount.com.

    The code
    RewriteCond %{SERVER_PORT} 443
    RewriteRule ^(.*)$ http://mysite\.com/404.html [R=301,L]

    was written by my webhost

    To me it means anything coming through the 443 port (SSL/https traffic ) send to a 404 page

    The rest of the code

    RewriteCond %{SERVER_PORT} ^443$
    RewriteRule ^robots\.txt$ robots_ssl\.txt [L]

    was also written by my webhost.


    To me it means everything else,^, (e.g. coming through port 80) send to Not, ^, my regular robots.txt page but rather that second robots.txr page., which is

    User-agent: *
    Disallow: /

    Which is how its done on this page
    http://www.seosandwitch.com/2012/08/...hat-to-do.html

    but why would I want to send robots coming through port 80 to
    User-agent: *
    Disallow: /
    (disallow all robots)

    Wouldn't that stop all robots coming through port 80, from crawling my site.
    I would think the second 2 lines of code should be removed.

  12. #12
    SitePoint Zealot bronze trophy xMog's Avatar
    Join Date
    Mar 2011
    Posts
    151
    Mentioned
    3 Post(s)
    Tagged
    2 Thread(s)
    Well, if you are redirected to a URL without HTTPS when you go to an HTTPS address, then the same will happen to Google. It's probably just that Google didn't cleaned its index yet. You could try to remove it yourself with Google webmaster tools:
    https://support.google.com/webmaster.../1663419?hl=en

  13. #13
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Unfortunately the https pages don't exist so there's no removing them from my site.

  14. #14
    SitePoint Zealot bronze trophy xMog's Avatar
    Join Date
    Mar 2011
    Posts
    151
    Mentioned
    3 Post(s)
    Tagged
    2 Thread(s)
    It doesn't matter if it "exists" or not. If it's in Google's indexes, it existed somehow. The procedure to remove a page from Google's indexes with webmaster tools is exactly for pages that don't exist anymore. Did you try it?

  15. #15
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Google W. won't take anything but http. In its webmaster tools choose Index and then Remove URLs and where it says, "Enter the URL that you'd like to remove (case-sensitive)" If you enter, for example, https://abc, what comes back is http://mysite.com/https://abc. Google's webmaster tools has been useless for this problem.

  16. #16
    SitePoint Zealot bronze trophy xMog's Avatar
    Join Date
    Mar 2011
    Posts
    151
    Mentioned
    3 Post(s)
    Tagged
    2 Thread(s)
    Well, I never tested it with https so I'll take your word for it. That's weird.
    Anyway, if you can't access the HTTPs page, Google should remove it from its index eventually. Maybe you could try to ask on a SEO forum. I think your problem isn't related to PHP anymore.

  17. #17
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Where is the SEO forum? I couldn't find it, unless you're talking about CMS stuff. My site is not run on any sort of program.

  18. #18
    SitePoint Zealot bronze trophy xMog's Avatar
    Join Date
    Mar 2011
    Posts
    151
    Mentioned
    3 Post(s)
    Tagged
    2 Thread(s)
    There are a lot of SEO resources online. SEO means Search Engine Optimization. There are forums dedicated only for SEO (not on sitepoint but looking on Google for "SEO Forum" will give you a bunch of links).

  19. #19
    SitePoint Guru
    Join Date
    Jan 2010
    Posts
    638
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I thought you meant an SEO forum on Sitepoint. I'm surprised they don't have an SEO forum.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •