SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Addict
    Join Date
    Oct 2004
    Location
    United Kingdom
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Which variables should I use in my User friendly URLs?

    I'm creating a website where the url consists of a descriptive string so that
    a) it assists when it comes to search engine optimisation and
    b) it may help my users to identify pages in the browser location bar

    I have seen a major website which has a url such as
    www.my domain.com/this-is-the-name-of-my-article,123.htm

    By playing with the url it looks like, using mod-rewrite, the php script only uses the page id (or 123) in the sql WHERE clause and so the article-name part of the url is merely there to describe the page and does not have a 'technical' purpose.

    There are a few pros and cons for this approach. Pros being the page will load quicker due to the 123 being a primary key and not an indexed char field such as this-is-the-name-of-my-article (I think!!?)

    And cons being that someone could link to my site from a site which gets spidered more frequently by the search engines. They could get my page listed in google looking something like:
    www.my domain.com/this-is-a-rude-word,123.htm
    and the page would still load ok.

    So what I'm trying to get at is what would you folks do in a similar situation. Just use the page_id in the WHERE clause or use the indexed page_name as well?
    Last edited by Googly; Aug 24, 2005 at 00:29.

  2. #2
    masquerading Nick's Avatar
    Join Date
    Jun 2003
    Location
    East Coast
    Posts
    2,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've been reading SP's SEM kit, and have been debating the real pros of a URL like that. A SE friendly URL is always nice, though it is really not that important anymore as they can all interprete dynamic URLs now. The real thing you should focus on is getting the keywords in the page title tag. A url like site.com/article-name,id.htm may not be worth all the trouble, as just site.com/id.htm would work just as well, be easier to use, and the extra keywords in the URL, as I said above, may not actually have such a big impact in SEO.
    Nick . all that we see or seem, is but a dream within a dream
    Show someone you care, send them a virtual flower.
    Good deals on men's watches

  3. #3
    SitePoint Zealot
    Join Date
    Aug 2005
    Posts
    123
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Googly
    And cons being that someone could link to my site from a site which gets spidered more frequently by the search engines. They could get my page listed in google looking something like:
    www.my domain.com/this-is-a-rude-word,123.htm
    and the page would still load ok.
    I wouldn't worry about that too much. If you post an article at http://www.domain.com/this-is-an-article,123.htm and someone links to it, they're most likely not going to try making up a URL of their own. They'll use whatever URL they found which would be http://www.domain.com/this-is-an-article,123.htm.

  4. #4
    SitePoint Addict
    Join Date
    Oct 2004
    Location
    United Kingdom
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Possibility
    A SE friendly URL is always nice, though it is really not that important anymore as they can all interprete dynamic URLs now. The real thing you should focus on is getting the keywords in the page title tag.
    I'm working on the premise that A SE friendly URL is always nice! Yes they mostly interpret dynamic URLs ok, but as domains and filenames seem to be noticed by the search engines, using keywords in the url must be a plus. Obviously the title factor you say I should focus on goes without saying. It's been crucial to SEO for ages....

    The point you raise about the search engines being able to interpret dynamic urls is more of a SE spider issue and how well your site is indexed/spidered. With my specific problem that's not really the issue.

    My original problem would still be an issue with a dynamic url such as
    w ww.mydomain?filename=this-is-an-article&page_id=123
    where I have included the filename to create more occurances of the keyword for the search engines. More importantly it allows the user to recieve more information about the page when cycling through their URL history in their browser.

    Quote Originally Posted by snortles
    I wouldn't worry about that too much. If you post an article at http://w ww.domain.com/this-is-an-article,123.htm and someone links to it, they're most likely not going to try making up a URL of their own. They'll use whatever URL they found which would be http://w ww.domain.com/this-is-an-article,123.htm.
    Ok maybe not the rude words (but you never know). It's not a great way to control your links etc if someone accidently puts a typo in the link to your page. This may create two entries for the same page in the search engines, which is never good.

    Maybe I've answered my own question. Maybe I should use
    WHERE filename='this-is-an-article' and page_id=123
    I'm just thinking this will create extra loading time that's not required?

    Any other thoughts from anyone?

  5. #5
    masquerading Nick's Avatar
    Join Date
    Jun 2003
    Location
    East Coast
    Posts
    2,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What I do sometimes is in the table, I have a column for SE friendly URL, like so:

    articles
    --id
    --title
    --title_sef
    --etc

    I use some PHP code to convert the title to a SEF title that can be used in URLS. For example, if I entered "Cool New BMX Stunts" as the title, my php function would, for title_sef, make that "cool-new-bmx-stunts". So when that appears in the URL, the query searchs the table for a column with a title_sef entry that matches the one found in the URL. No article ID is needed, and if someone enters in a wrong title_sef, you can return an appropriate error.

    May not be the most efficient way of doing it, but it works for me. My URLs usually look something like www.mysite.com/view/cool-bmx-tricks

    Hope that helps
    Nick . all that we see or seem, is but a dream within a dream
    Show someone you care, send them a virtual flower.
    Good deals on men's watches

  6. #6
    SitePoint Addict
    Join Date
    Oct 2004
    Location
    United Kingdom
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks possibility.

    The fact that my web site is always growing, with new articles possibly covering the same subject matter as other pages means that I have to make a page unique with an id or date so that a 'filename' can be duplicated.

    I suppose as long as the urls stay the same I can always play around with what's being used in the sql query (i.e. behind the scenes). That probably makes the most sense.

  7. #7
    masquerading Nick's Avatar
    Join Date
    Jun 2003
    Location
    East Coast
    Posts
    2,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How about putting in an AND statement in your query?

    SELECT * FROM articles WHERE id = $id AND title_sef = $title_sef

    That way there could be duplicates of title_sef but the article id would take care of it.
    Nick . all that we see or seem, is but a dream within a dream
    Show someone you care, send them a virtual flower.
    Good deals on men's watches

  8. #8
    SitePoint Addict
    Join Date
    Oct 2004
    Location
    United Kingdom
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Possibility
    How about putting in an AND statement in your query?

    SELECT * FROM articles WHERE id = $id AND title_sef = $title_sef

    That way there could be duplicates of title_sef but the article id would take care of it.
    That's basically the whole gist of my question. I was wondering how must this would cost me in the speed stakes. Your example is how I started out. However when I saw a large website use what I believe is SELECT * FROM articles WHERE id = $id, but not id = $id AND title_sef = $title_sef
    I began to wonder if that was a better way forward, especially as someone with more experience has done it.

    I realised that they had done this because whatever you changed the title_sef to, the page still loaded properly. This is what triggered this thread.

  9. #9
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There is absolutely nothing wrong with using an indexed (try 10 or 15 characters) varchar field for urls filtered through mod_rewrite.

    Will performance be slightly less than using straight ids? Yes, but negligably so. MySQL (and just about every db platform worth a damn) converts strings to binary hashes before doing the lookup anyway.

    I'm in charge of the development of Content Management Systems (among other things) for many sites, one of which gets about 14,000,000 page views per month. Under normal viewing (no bots), a dual 2.8Ghz Xeon with 2GB of RAM can handle everything (apache, mysql, php) with a load of less than 2 (we were actually forced into that situation once when we experienced a massive hardware failure; thanks a lot, Sun!)

    Today we run the site on 6 servers, for redundancy's sake. (3 mirrored web servers, and LVS/NFS system, and a MySQL cluster). Load on any given box sits at less than 0.5 (and these are hyperthreaded) most of the time. Things spike up a bit when spiders crawl, but we've never had any performance issues. We're logging everything and retrieving all files over NFS (extremely slow), and the average page load time is about 25ms-50ms during peak hours.

    The urls typically look like this (there are a few variations, due to legacy systems)

    www.domain.com/url-to-page

    No extension. No commas. Nothing else, except for when pagination has to be applied, in which case the urls look like:

    www.domain.com/url-to-page,2

    And that's it. The rewrite is pretty simple; trigger the script for any file with no extension on it that isn't in a sub-directory.

    That's basically the whole gist of my question. I was wondering how must this would cost me in the speed stakes. Your example is how I started out. However when I saw a large website use what I believe is SELECT * FROM articles WHERE id = $id, but not id = $id AND title_sef = $title_sef
    I began to wonder if that was a better way forward, especially as someone with more experience has done it.
    If the index is created on (id,title_sef), the latter would be faster to look up. If it's only on id, the former would be, but you'd save a considerable amount of storage space. Generally speaking, I'd only use unique ids (regardless of whether you use urls or id numbers).

  10. #10
    SitePoint Addict
    Join Date
    Oct 2004
    Location
    United Kingdom
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hi etnu thanks very much for your advice/opinions. I might have a look at making (id,title_sef) an index and see how i get on.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •