SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 40
  1. #1
    SitePoint Evangelist
    Join Date
    Mar 2011
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Can anyone suggest a reason why I may want an ID field???

    Hi,

    I will soon be building a MySQL Database. I have read in many tutorials and web sites that there should be an ID field that auto increments.

    I intend to use the WHERE function in MySQL to select an entry in the database. Each WHERE function will request a unique product name. In addition it will be more search engine friendly. Rather than having www.mysite.co.uk/products.php?id=23 I will have www.mysite.co.uk/products.php?name=productname.

    Furthermore, I intend to store transactions within another database. There will be an entry for time and date of purchase. Again, this is more useful that an ID that just auto increments. Much easier to remember a date than remember an ID?!

    Can anyone suggest a reason why I may want an ID field??? Am I missing the point with using an ID field?

    Matt.

  2. #2
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,633
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    This makes sense for most cases, but starts to break around the edges. Problems with using product name are:

    * IDs should generally be immutable, products change names. How are you going to cascade changes to the other database?
    * IDs need to be unique. Product names are not always unique.
    * IDs tend to get stored in multiple places. Product names are longish so they won't be as efficent.
    * IDs get used alot in debugging, do you really want to depend on typing 'fuzzy bunny slippers, LARGE PINK' correctly all over the place?

    For transaction dates, you've got one small issue -- there most definitely can be simultaneous transactions in the system. You also have lots of interesting issues around dates and times as they translate between layers in the system.

    It has been pretty well and proven that the only things that really work as IDs in most cases are:

    a) Integers, typically AUTOINCRIMENT style
    b) GUIDs, which are really big long funny integers
    c) Short system generated strings

    And (c) is still a risky option at times.

  3. #3
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,266
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    I'm going to respond to these in reverse order:

    Quote Originally Posted by MatthewBOnline View Post
    Can anyone suggest a reason why I may want an ID field??? Am I missing the point with using an ID field?
    I use IDs for a number of reasons:
    • Foreign keys
      • if a record is going to be referenced in another table (a product is referenced within an order for example), then I find using ID fields to be the most efficient use for me. Plus, comparing one field on a record is easier than comparing two or three.
      • Speed - again, this goes back to the foreign key bit, but numeric lookups are more efficient than string lookups unless you really, REALLY know what you're doing with indexing.
      • Ease of maintenance
    • Uniqueness - say I had two companies called Maxwells Music Shop, I don't want to have to arbitrarily add something to a companies name just to make it unique - I'll use a ID field on the record so I don't have to make an overly complex key just to get uniqueness.
    • Normalization - database normalization rules have been in place for many, many years for a reason - yes, hardware today can overcome many things, but why make anything harder than it needs to be.
    Quote Originally Posted by MatthewBOnline View Post
    Furthermore, I intend to store transactions within another database. There will be an entry for time and date of purchase. Again, this is more useful that an ID that just auto increments. Much easier to remember a date than remember an ID?!
    I'd argue that it's easier for a customer to remember a transaction id vs a date. You've also got the issue of duplication - unless you go down to milliseconds, the chances of duplicate order dates are VERY high and confusion will reign, both with you, your staff and your customers.

    Quote Originally Posted by MatthewBOnline View Post
    I intend to use the WHERE function in MySQL to select an entry in the database. Each WHERE function will request a unique product name. In addition it will be more search engine friendly. Rather than having www.mysite.co.uk/products.php?id=23 I will have www.mysite.co.uk/products.php?name=productname.
    With htaccess rewrite rules, this has nothing to do with your database setup. A better URL would be www.mysite.co.uk/products/productname or even www.mysite.co.uk/productname and your URL rewrite rules will parse it and send the user to the correct page.
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse

  4. #4
    SitePoint Addict
    Join Date
    Apr 2009
    Posts
    357
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    This google search will point you at lots of discussions/opinions on this subject.

    natural vs artificial keys - Google Search
    Doug G
    =====
    "If you ain't the lead dog, the view is always the same - Anon

  5. #5
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,151
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    For reasons outlined by both wwb_99 and DaveMaxwell 90% of the time I find surrogate keys to be the better/hassle free option. You can still use SEO garbage URLs anyway and literally ignore the SEO garbage: /23456/My-Really-Stupid-SEO-Friendly-Title-That-Will-Probably-Change-Anyway-DOH.
    The only code I hate more than my own is everyone else's.

  6. #6
    SitePoint Member
    Join Date
    Oct 2010
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's important to separate the two different concerns raised by your question. The first, the one you are directly asking about, is what identifier to use in your URLs. The second concern is the design of your database. Since this is the Database forum I'll answer the second of these first, just as other people already have done.

    In database design it is quite common for things to have multiple identifiers. In a relational database table the identifiers are keys and there's nothing wrong with having several different keys in a table. It is quite common for a table to have keys both for the business key (the identifier familiar to end users) as well as a "technical" or "surrogate" key (a key seen and used by database developers and DBAs but without business meaning and not exposed to the end users).

    So although the points made by wwb_99 and DaveMaxwell are potentially relevant concerns when implementing keys, none of those things in themselves should stop you using any appropriate key value in your URLs - because there is no reason why you have to choose only one type of key or the other. Interface design should not dictate database design.

    The most important criteria for choosing keys are Familiarity, Stability and Simplicity. A user-visible product code, stock number or possibly even a unique product name are good candidates for keys but in all cases business requirements should be the deciding factor. For instance, stability is desirable from the database designer's point of view but doesn't necessarily match what is really required from the user's point of view since key values sometimes need to change.

    What I suggest you do not do is expose any surrogate key in the URL. That would undermine some of the advantages of having a surrogate key in the first place. Of course, once you expose the key to users it inevitably acquires some business meaning and is no longer a surrogate.

    Hope this helps.

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    nice one, david

    your advice (not to expose a surrogate key) is completely opposite to common practice, innit

    heck, even vbulletin uses them in its urls

    (we had a thread on here a while back, which i don't want to re-open, where someone suggested that since tables like threads and posts don't have a useful natural key, the surrogate number assigned to them actually ~becomes~ the natural key!)

    yes, i do realize "common practice" does not make it "best practice" but you're definitely swimming against the current on this one...

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Enthusiast
    Join Date
    Nov 2008
    Posts
    38
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I think the reasons for using an Id rather than a date or description have been pretty well expressed on here.

    I would just like to add that although an autoincrement can be convenient, it is not as predictable as one may wish. There may be times when you want to know in advance what the key will be, eg. when preparing a script to add data with the Id as a foreign key on other tables. at times like this I use my own function for generating a key, or returning a key that is supplied in advance.
    You obviously have to make sure the ranges are well apart. I would also suggest having the ID as a character rather than an integer. Don't forget to add a unique index.

  9. #9
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by steveob200653 View Post
    There may be times when you want to know in advance what the key will be
    This doesn't fly. On systems with a lot transactions you can never know for sure what the ID of a row you're about to insert will be. Between the time you query the database to find out what it probably will be and the time you actually insert the row another process might have just inserted a row with the ID you just got, causing all sorts of havoc.
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  10. #10
    SitePoint Enthusiast
    Join Date
    Nov 2008
    Posts
    38
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    This doesn't fly. On systems with a lot transactions you can never know for sure what the ID of a row you're about to insert will be. Between the time you query the database to find out what it probably will be and the time you actually insert the row another process might have just inserted a row with the ID you just got, causing all sorts of havoc.
    This does fly. There are times when you do want to know in advance eg. scripts to set up standing data that will be tested in a separate environment, and the subsequent data should be recognisable with the same test scripts whatever the environment. I may not have made it clear but you make sure that you never insert a row with a value that could be inserted by another process. This is why I propose that the values from separate processes use different ranges, I favour character based prefixes. If that is not suitable, then use entirely different ranges.

  11. #11
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by steveob200653 View Post
    This does fly. There are times when you do want to know in advance eg. scripts to set up standing data that will be tested in a separate environment, and the subsequent data should be recognisable with the same test scripts whatever the environment. I may not have made it clear but you make sure that you never insert a row with a value that could be inserted by another process. This is why I propose that the values from separate processes use different ranges, I favour character based prefixes. If that is not suitable, then use entirely different ranges.
    I assumed you were talking about fetching the next auto_increment (or indeed even MAX(id)+1) before. Now that I see you weren't I take back what I said. You're right, that does fly
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  12. #12
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    72 Post(s)
    Tagged
    0 Thread(s)
    The simple test i use is "Will I be storing this ID in another table?" If yes, I use a synthetic key. If no, a natural key is more than sufficient.

    Mostly, it's a way of reducing redundant overhead - it takes a lot less space to store the number 14 1000 times than it does to store "Daves Magical Carpet Ride" 1000 times.

  13. #13
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by StarLion View Post
    it takes a lot less space to store the number 14 1000 times than it does to store "Daves Magical Carpet Ride" 1000 times.
    Plus integer comparison is a lot faster than string comparison (for JOINing).
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    Plus integer comparison is a lot faster than string comparison (for JOINing).
    whenever i see a statement like this, i always ask the person who made it to provide a definitive and authoritative reference

    so, you gots one?

    in actual fact, i suspect you would have difficulty measuring the difference

    besides, basing a table design choice on the difference in CPU comparison of strings or integers is the tail wagging the dog if it leads you to select a design which ~requires~ additional joins just to return a string for an unknown integer

    "integers are faster" (despite the fact that they probably are) is just one of those urban database myths that people get hung up on
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    72 Post(s)
    Tagged
    0 Thread(s)
    r937 - the only statement i'd make there is that it would be faster because the table isnt holding as much data, which would reduce check time? Smaller Table = Faster Search, Int rather than string = Smaller Table, ergo Int rather than String = Faster Search ?

    (I'm not going to say 'a lot', because I believe my reasoning is subject to relative sizes as to it's magnitude).

  16. #16
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by r937 View Post
    whenever i see a statement like this, i always ask the person who made it to provide a definitive and authoritative reference

    so, you gots one?
    I do. In MySQL (and most DBMSs, if not all) an integer is stored in 4 bytes, whereas strings take up as many bytes as they have characters (or a multiple thereof if you use any of the utf-8 or utf-16 encodings) plus I'm pretty sure VARCHARs add a 0 byte at the end to indicate the end of the string, so that's one byte extra. So unless your strings are all 3 characters or shorter, integer comparison will be faster. Indeed if your strings are 7 characters long integer comparison will be twice as fast, etc.

    Quote Originally Posted by r937 View Post
    besides, basing a table design choice on the difference in CPU comparison of strings or integers is the tail wagging the dog if it leads you to select a design which ~requires~ additional joins just to return a string for an unknown integer

    "integers are faster" (despite the fact that they probably are) is just one of those urban database myths that people get hung up on
    It's something that's used throughout computer science and is well established, but I agree that the difference you'll notice in databases is probably (close to) negligible in the grand scheme of things.
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    I do.
    so where is it? i mean, the link

    specifically, the link that actually ~quantifies~ the difference, and provides support for your use of "a lot"

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  19. #19
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    The other reason for an integer key instead of a string key would be just the size itself.

    As Rémon said, an integer is only 4 bytes whereas a string is at least 1 byte per character + 1 byte. Say your strings average only about 10 characters, that's 11 bytes per row, which is 7 more than an integer. Multiply that by 100, you're talking 1.1KB extra, by 100,000, you're talking an extra meg... just for storing the key.

    Say you use the key to reference it in 5 other tables, you're now up to 5 extra megs.

    Include into that all the extra processing it takes to store those, and the extra bandwidth to transfer those few extra bytes, and things can start compounding quite rapidly.

    I think the main reason for using an auto-incremented immutable integer ID is so you have something short and sweet use as a foreign key and link your tables together. The immutable part is just as important (if not more so) than the auto-incremented or integer point. It becomes very tricky to change a primary key and have it reflected everywhere. It can be done, but it's a lot of extra hassle.

  20. #20
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    your first link is a nice explanation of the performance of retrieving longer rows as opposed to shorter rows off the disk (but does not otherwise talk about actual comparisons) -- a most important consideration, to be sure

    your second link talks about the actual comparisons, and by inference, suggests that string comparisons might execute more slowly simply due to the number of machine code operations involved

    but again, nothing to quantify that integer comparisons are "a lot" faster

    it's the "a lot" part that i had the problem with
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  21. #21
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    also, let me state that i'm not disagreeing that an integer (surrogate) key can be useful

    but keep in mind that you would also have a UNIQUE index on some other column(s) along with the surrogate key

    so using a surrogate key is additional space in the row, right?

    you would only do that if necessary, namely when (1) using it externally, like in a url http://example.com/showproduct.html?id=937, or (2) if the table has child tables, i.e. a foreign key references it
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  22. #22
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by r937 View Post
    it's the "a lot" part that i had the problem with
    To be clear, I said that integer comparison are "a lot" faster than string comparisons and did not say that a JOIN with integers will be "a lot" faster than a JOIN with strings (which is what I think your asking me to show).

    As for any source on JOINs, I don't have any, and have no idea how to set up a fair test.
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  23. #23
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    To be clear, I said that integer comparison are "a lot" faster than string comparisons
    same problem -- i object to the "a lot" part

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  24. #24
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,081
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by r937 View Post
    same problem -- i object to the "a lot" part

    Using CHAR keys for joins, how much is the overhead ? - MySQL Performance Blog

    I don't know about you, but for me "over 6 times slower" constitutes "a lot"
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  25. #25
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    Well, even if it's .00001 seconds difference per operation, over the span of a million operations, that's still 1 second. =p


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •