SitePoint Sponsor

User Tag List

Results 1 to 21 of 21
  1. #1
    SitePoint Evangelist goughb's Avatar
    Join Date
    Sep 2000
    Location
    Chicago
    Posts
    526
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    SEO Friendly and PHP Maybe ModRewrite?

    I am looking to develop a site and would like to use Search Engine Optimizing through making friendly urls for an engine to crawl. What is the best approach to do this? I couldn't find a thread explaining how to put it all together from the start. Please help. Thank you.

  2. #2
    SitePoint Enthusiast escape164's Avatar
    Join Date
    Dec 2002
    Location
    Colorado, USA
    Posts
    79
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Your subject includes a reference to mod_rewrite. I would suggest doing a google search on it and these forums to find a good script to use with mod_rewrite.

    Now, you basically have two options when it comes to pretty URL's. First, the old fashioned way is to have one file with a pretty URL for each page, ie

    ./index.php (Main index file)
    ./about/aboutus.php (aboutus.php file in the about dir)
    etc.

    This works very well and is tried and true. You can very easily include a config file at the top of each PHP file and import headers, footers, etc. This style also lends itself to a lower learning curve for maintenance. Depending on your project, the next guy who comes along might not know about mod_rewrite.

    <rant>
    From experience, if that happens, the following events will occur:
    1. Your code will get slashed, hacked, mutilated and generally destroyed no matter how many patterns you followed or how elegant your solution. Only suggestion to this is KEEP BACKUPS!!!
    2. You name will be slandered as a person who knows nothing because the dimwit next to you had to rewrite your entire site so he could understand it. Again, BACKUP YOUR CODE!!!
    </rant>

    Now, that all said, using mod_rewrite can be incredibly enjoyable and useful if you can get around the above problems. I would suggest taking a look at Drupal for an example of a project that uses mod_rewrite intelligently and successfully.

    Keep in mind your end goal and the patterns & steps you will be using to get there and pick the best tool for the job. So many times people pick a screwdriver when a hammer is needed because a screwdriver is so much cooler. That is stupid and it shows immaturity as a coder.

    If mod_rewrite is the best tool, educate yourself and use it. If not, don't. It's that simple.

    I wish you the best.

  3. #3
    SitePoint Enthusiast underzen's Avatar
    Join Date
    Apr 2004
    Location
    Ft. Lauderdale, FL
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have played with mod_rewrite a bit and have actually gotten pretty good at it. Just devote a day to it and you have the basics down.

    It seems that GoogleBot is able to crawl dynamic urls alot easier lately so its not the end of the world if you can't get mod_rewrite working. Also make sure you have nice structure to your html with your main keywords spread out through your content.

  4. #4
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by underzen
    It seems that GoogleBot is able to crawl dynamic urls alot easier lately so its not the end of the world if you can't get mod_rewrite working.
    This can be hit and miss. I have seen Google balk at one parameter, but sometimes accept four. Anything labeled SID, SESSION, ID, etc. especially seems to annoy it. Keep the session in the cookie and have two or less parameters and you are OK for Google (40+%). The Yahoo engine has recently been rewritten from scratch and is also a lot more tolerant (30%+), but other engines will reject even one parameter.

    You can also have URLs such as...
    Code:
    http://www.site.com/index.php/category/style/
    ...if you cannot use mod_rewrite. Most engines accept this.

    Besides having searchable URLs the content must also be reachable by the spiders. This means engine doorway pages that have the entire product range as static links. Spiders cannot use drop downs or search boxes after all.

    You actually get to play with rankings within your site as well. So if a lot of the other catalogue items point at your most profitable items, they will get pushed up and are used as the page for the search engine result link.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  5. #5
    SitePoint Guru
    Join Date
    Jul 2004
    Location
    Raleigh, NC
    Posts
    783
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    there's not a lot of things i can claim to be an expert at but seo is one of them. mod_rewrite is HIGHLY recommended. never use session ids that get passed in the url. even after rewriting, try to keep the number of directories deep low. if you do pass variables in the url, try to keep it to 2 or under. another note: remember, search engines cannot take action. if the only way to get to a link to page x is to mouseover on a css menu on page y, the search engine will probably never find page x

  6. #6
    SitePoint Evangelist goughb's Avatar
    Join Date
    Sep 2000
    Location
    Chicago
    Posts
    526
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Awesome advice, thank you all!

    I am confident with the information provided that I can put something together, my only concern is figuring out the sessions. ie, what if a user doesn't have cookies..

    Brett

  7. #7
    SitePoint Guru
    Join Date
    Jul 2004
    Location
    Raleigh, NC
    Posts
    783
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    your cookies comment reminded me of another thing - search engines don't use cookies. so if access is flatout denied to a page unless you use cookies, then it will turn search engines away also

  8. #8
    SitePoint Enthusiast underzen's Avatar
    Join Date
    Apr 2004
    Location
    Ft. Lauderdale, FL
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Darchangel
    your cookies comment reminded me of another thing - search engines don't use cookies. so if access is flatout denied to a page unless you use cookies, then it will turn search engines away also
    typically you will see pages that require a session variable to view the page display the page but display a "Login Error" page, and that page will typically get indexed by the search engine.

  9. #9
    SitePoint Addict
    Join Date
    Apr 2002
    Posts
    330
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by underzen
    It seems that GoogleBot is able to crawl dynamic urls alot easier lately so its not the end of the world if you can't get mod_rewrite working.
    I guess this has always been a misunderstood matter.

    When Google says it may not crawl sites with dynamic pages, in practice that means, it will slow down crawling if the pages take to long to be served because the crawling must be causing too much load to the server.

    The solution for this is to use good PHP script caches and content caches.

    The matter of the parameters in the URL is a bogus guess on what it means when Google says that dynamic pages may not be crawled.

    People that doubt it just try searching for instance about this well known PHP site, among many others, that does not use mod_rewrite but certainly uses PHP script caches and content caches.

    As you may see Google indexes tens of thousands of pages with URL parameters, some very long, which is enough to invalidate the argument for the need for use of mod_rewrite or any similar scheme.
    Manuel Lemos

    Metastorage - Data object relational mapping layer generator
    PHP Classes - Free ready to use OOP components in PHP

  10. #10
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by mlemos
    When Google says it may not crawl sites with dynamic pages, in practice that means, it will slow down crawling if the pages take to long to be served because the crawling must be causing too much load to the server.
    This is only partially true. Google actually has to capture quite a bit of information including page contents and guessing the language and identifying the geographical location of the server. I managed to get some face time with one of the Google chief engineers (I am forced to go to SEO gatherings ). There is an overall timeout for the whole page, but it basically it tries to reject pages as quickly as possible using various heuristics.

    One of those rejection triggers is user specific content because it can end up with duplicate pages and the number of parameters is definitely a factor here. Another is "spider traps" where links beget links (blogs recently got penalised for this). Another is excessive internal linking because of spammers.

    The only reliable guide is to get your pages indexed frequently. To do this get a link from a popular site and make sure your content keeps changing. Then have a look at what pages are actually being indexed by using the Google tools. If you really cannot get indexed then consider investing in a trusted feed.

    Also now they are in hot competition with Yahoo and so the rules change almost weekly anyway...

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  11. #11
    SitePoint Evangelist goughb's Avatar
    Join Date
    Sep 2000
    Location
    Chicago
    Posts
    526
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Excellent, this really clears many things up. Thank you for the well thought out thorough advice, I greatly appreciate it.

  12. #12
    SitePoint Addict
    Join Date
    Apr 2002
    Posts
    330
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by lastcraft
    One of those rejection triggers is user specific content because it can end up with duplicate pages and the number of parameters is definitely a factor here.
    I suppose that the relation between detecting duplicate pages and the number or parameters is your own conclusion.

    AFAIK, Google has always been able to detect duplicated pages with URLs that do not even take parameters.

    Anyway, the relation of the number of parameters and not crawling a page argument is invalidated by piles of pages in Google index that have many arguments in the URL. I have seen it with 4 long arguments but I suspect that may find with more if I bother.

    Another point regarding mod_rewrite, even if you insist that it is important, you do not need it to achieve the same effect using just PHP code to map the URI to arguments just by parsing the URI, as long as you are using Apache. This is just a note regarding people that are wasting time and effort to use something that is not needed.
    Manuel Lemos

    Metastorage - Data object relational mapping layer generator
    PHP Classes - Free ready to use OOP components in PHP

  13. #13
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    Germany
    Posts
    216
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just read a note about osCommerce saying it has an option to prevent search spiders from getting assigned a session ID. Quite neat. Have not investigated further yet, but I guess it's a good starting point if you want to know how to do it.

  14. #14
    SitePoint Zealot prefab's Avatar
    Join Date
    Jan 2003
    Location
    Belgium
    Posts
    133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Am I the only one thinking that search engine friendly urls are far more useful to human beings anyway?
    I think you should try to make your URLs user friendly (= easy to remember, semantic),
    which just happens to be search engine friendly too, in most cases.

    To me it's all about usability!

  15. #15
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by mlemos
    I suppose that the relation between detecting duplicate pages and the number or parameters is your own conclusion.
    No.

    Quote Originally Posted by mlemos
    AFAIK, Google has always been able to detect duplicated pages with URLs that do not even take parameters.
    They are looking for similar pages, rather than exact matches. A rule of thumb is to reject things that look like session information. This often plays foul with commercial CMS's with long product codes. It is definitely a weighted algorithm and is definitely affected by parameter count. I was told this very directly from Google and so were a room full of other conference attendees. The other information (the other triggers) were explained to me in private conversations after the event.

    Quote Originally Posted by mlemos
    Anyway, the relation of the number of parameters and not crawling a page argument is invalidated by piles of pages in Google index that have many arguments in the URL. I have seen it with 4 long arguments but I suspect that may find with more if I bother.
    The highest I have heard of six by natural spidering, but have never personally had a page indexed this compicated. The SEO rule of thumb was three and this was from a specialist that built CMS buffer layers to deal with exactly this issue. Don't forget that big companies will use trusted feeds, which can have aritrary URLs, so simply looking at Google links will not be enough.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  16. #16
    simple tester McGruff's Avatar
    Join Date
    Sep 2003
    Location
    Glasgow
    Posts
    1,690
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Fascinating & very useful info - I had previously assumed that most modern SEs did not distinguish between static / dynamic pages.

  17. #17
    SitePoint Member
    Join Date
    Dec 2002
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here's my .htaccess file that goes in my webroot.

    PHP Code:
    ALLOW FROM ALL
    RewriteEngine On 
    RewriteBase 
    /
    RewriteRule ^html/(.*)$ index.php?q=$[L,NC
    That sends www.whatever.com/html/pages/my_page/ to www.whatever.com/index.php?q=pages/my_page so you can process everything with that php file.

    However www.whatever.com/images/x.jpg does not get processed. Just the stuff in the html directory.

  18. #18
    SitePoint Evangelist goughb's Avatar
    Join Date
    Sep 2000
    Location
    Chicago
    Posts
    526
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Awesome, thank you anyone else have examples of their rewrite?

  19. #19
    SitePoint Guru
    Join Date
    Jul 2004
    Location
    Raleigh, NC
    Posts
    783
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by prefab
    Am I the only one thinking that search engine friendly urls are far more useful to human beings anyway?
    I think you should try to make your URLs user friendly (= easy to remember, semantic),
    which just happens to be search engine friendly too, in most cases.

    To me it's all about usability!
    i agree with you in general, but as someone who's been working for a search engine optimization company for 2 yrs, i feel compelled to say: no one's going to stay on your site if it's un-usable, but usable or not, they (and the search engines) have to be able to find it first

  20. #20
    SitePoint Member
    Join Date
    Oct 2004
    Location
    PA
    Posts
    20
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    simple rewrite

    ALLOW FROM ALL
    RewriteEngine On
    RewriteBase /
    RewriteRule ^html/(.*)$ index.php?q=$1 [L,NC]

    Can you explain the rewrite in very simple terms. I went to the trouble of having php cms system written only to find that its not search engine friendly. I am a newbie and would appreciate a simplistic explanation

    thanks
    Thank You
    Elizabeth
    Antiques & Old World Charms

  21. #21
    Non-Member
    Join Date
    Oct 2004
    Location
    downtown
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    http://www.sitepoint.com/article/guide-url-rewriting
    http://httpd.apache.org/docs-2.0/misc/rewriteguide.html

    Would recommend that you study the second link, had a quick look over it and very imformative


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •