SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Member
    Join Date
    Jul 2012
    Location
    Toronto, Ontario
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Foreign Keys and Field Value Population, What's the More Efficient Approach?

    Hi everyone,

    I'm trying to design a database and I want to know what's the most efficient way to populate fields with referenced values.

    I have the following database design:
    Database.JPG

    For the country table, my unique ID is not a number but rather an ISO two-letter country code, i.e. JP, US, DE, etc. This leads to my first question:
    Q1) Is this an efficient approach, using a non-numerical ID, or is numerical ID's always the best?

    For the vehicles table, I have the vehicleBrand field referencing the vehicleBrandID field of the vehicleBrands table. When I do this though of course, the dropdown to select what vehicleBrand entry is just a list of the vehicleBrands IDs, like so:
    Database2.JPG

    This leads me to my second question:
    Is it best to use a numerical ID for internal database operations, and then when you generate some kind of view, i.e. a dropdown, add extra code to just populate those dropdown options with another field from that row

    Thanks in advance!

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by GG09 View Post
    Q1) Is this an efficient approach, using a non-numerical ID, or is numerical ID's always the best?
    yes, it is

    using a standard 2-character code instead of a meaningless number has several advantages

    you will hear people say that indexing a 4-byte integer is better, but it isn't

    even if you use a 2-byte SMALLINT, you're still faced with the need to do a join just to pull the country name or code

    Quote Originally Posted by GG09 View Post
    Is it best to use a numerical ID for internal database operations, and then when you generate some kind of view, i.e. a dropdown, add extra code to just populate those dropdown options with another field from that row
    i don't understand why you would want to do that

    by the way, your diagram was quite confusing -- is there any chance you could show us the actual CREATE TABLE statements including the actula PK/FK declarations?

    p.s. shout out from the t.dot
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Member
    Join Date
    Jul 2012
    Location
    Toronto, Ontario
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey Rudy (making the assumption that's your name from your site!) thanks for your response, and a shout out back to a fellow Torontonian! I'm really new to any database system; I think your mention of a join from my first question was what I was referring to in my second question, I think... Does a join basically say something like this:

    "Oh, you want to refer to record/row with the ID of 123, but use something the more meaningful brand name field for that record/row instead"?

    My appologies for the diagram, I was using MySQL Workbench and that's what the ER diagram was; here's the SQL statements. Please let me know if my relationships are correct! Thanks

    Vehicles Table:
    CREATE TABLE IF NOT EXISTS `driverr`.`vehicles` (
    `vehicleID` INT NOT NULL ,
    `vehicleBrand` INT NULL ,
    PRIMARY KEY (`vehicleID`) ,
    INDEX `veicleBrandID` (`vehicleBrand` ASC) ,
    CONSTRAINT `veicleBrandID`
    FOREIGN KEY (`vehicleBrand` )
    REFERENCES `driverr`.`vehicleBrands` (`vehicleBrandID` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
    ENGINE = InnoDB


    vehicleBrands Table:
    CREATE TABLE IF NOT EXISTS `driverr`.`vehicleBrands` (
    `vehicleBrandID` INT NOT NULL ,
    `vehicleBrandName` VARCHAR(45) NULL ,
    `vehicleCountryOrigin` VARCHAR(2) NOT NULL ,
    PRIMARY KEY (`vehicleBrandID`, `vehicleCountryOrigin`) ,
    INDEX `countryID` (`vehicleCountryOrigin` ASC) ,
    CONSTRAINT `countryID`
    FOREIGN KEY (`vehicleCountryOrigin` )
    REFERENCES `driverr`.`countries` (`countryID` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
    ENGINE = InnoDB


    countries Table:
    CREATE TABLE IF NOT EXISTS `driverr`.`countries` (
    `countryID` VARCHAR(2) NOT NULL ,
    `countryName` VARCHAR(45) NULL ,
    PRIMARY KEY (`countryID`) )
    ENGINE = InnoDB

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    no, the relationships aren't right

    one good thing about using mysql workbench is that you can "forward engineer" the diagram to produce the actual CREATE TABLE statements, and then -- here's the good part -- actually run them to see if they'll work

    i suggest you do this repetitively as you build up your schema, dropping tables and re-creating them often

    your vehiclebrands PK is { vehicleBrandID,vehicleCountryOrigin } -- why? do you allow the same vehicleBrandID to be registered for multiple countries?

    if so, and you want to reference a row in this table using a foreign key in the vehicles table, then your foreign key must also have a vehicleCountryOrigin column as part of the FK
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by GG09 View Post
    Q1) Is this an efficient approach, using a non-numerical ID, or is numerical ID's always the best?
    The one problem I see with using actual information for the IDs is the fact that, in my experience, you should never change values for the ID columns.

    In the event a country changes international two-letter country code, you are forced to do that, change the PK value. Using a "mindless" "disconnected" numeric ID saves you from doing that.

    And since you are going to join data to find out the country details, numeric IDs are better.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    gg, itmitică makes a great point

    you will therefore have to decide what the risk is, should a country change its two-letter international code

    see http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 which will give you an indication of how often this happens

    by the way, in case should you decide to use the 2-letter country code, and one of your countries does happen to change its code, you can easily accommodate this with the ON UPDATE CASCADE option of the foreign key, although some purists will say that this goes against the spirit of primary keys never being allowed to change

    now, as to the point about always having to perform a join, this is ~definitely~ true if you record the country code as a number (e.g. country 42)

    however, i'm sure there will be instances where you simply want to look at data, say in the vehiclebrands table, for example, where you actually don't have to perform the join because you instantly recognize what US or UK or CA stands for

    so now you have sufficient information to make the choice

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    A few more thoughts.

    Surrogate keys versus natural primary keys has always been surrounded by controversy.
    From personal experience, surrogate keys are the long term answer.

    If you're guaranteed to have unique two-letter code values, why bother making it an ID?
    Normal indexing should be enough.

    Natural keys fit the scenario when you absolutely control their creation and when they're "narrow" and they're absolutely unique.
    You only satisfy the "narrow" condition.

    Official interpretations:
    AL could mean Albania or Alabama
    BE could mean Belgium or Berne
    NL could mean Netherlands or Newfoundland and Labrador
    SV could mean El Salvador or Suceava

    So, when you need to retrieve full names, you'll have to join data.
    Databases will transform text values in join conditions to numeric values before performing the join.
    Hence, joining with conditions based on text values are slower than joining with conditions based on numeric values.

    In the event you want to switch to Alpha-3 or to Numeric codes, using the actual Alpha-2 codes as identity/foreign keys will stand in you way.

    Here's an idea, why not use the standard all together, with Numeric codes: http://en.wikipedia.org/wiki/ISO_3166-1 ?

  8. #8
    SitePoint Member
    Join Date
    Jul 2012
    Location
    Toronto, Ontario
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi r937 and itmitică, thanks for your replies!

    It seems as though a good practice is to just use the numeric IDs, though in the discussion of this topic I've learned a few things which I thank you guys for (i.e. the ON UPDATE CASCADE)!

    r937, I didn't mean for that to be the case you're right, I just meant for the VehicleBrandID to be the primary key.

    To r937 and itmitică, is a join something that is always executed when the database is quiried, i.e., the results of a join are not stored anywhere are they? Also, just to confirm I fully understand what I join does, please review my understanding:

    I have a table of countries all with unique numeric IDs and a field for the country name. I then have a table of countries I've visitied, with a date field and a country ID field.

    I want to call up all the places I've visited and print the countries' names on a web page. Since I only have the country's ID in my visited table, I need to "join" the the country name field to the ID and then print the country name. Is this correct?

    Thanks a lot!

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by GG09 View Post
    It seems as though a good practice is to just use the numeric IDs
    no, it's not necessarily "good practice"

    as you have seen, it is a contentious issue, and it is usually contested by purists

    pragmatists have no trouble using surrogate keys (which is what they're called) when appropriate, and natural keys when appropriate, while purists often insist upon using one or the other at all times (therre are purists on both sides)

    Quote Originally Posted by GG09 View Post
    ... is a join something that is always executed when the database is quiried
    no

    a join is something that you decide to do when you write the query, and the results of any query (join or not) are not stored


    Quote Originally Posted by GG09 View Post
    I want to call up all the places I've visited and print the countries' names on a web page.
    yes, you need to do a join in this case
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    Queries may be cached: http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
    The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

    [...]

    The query cache does not return stale data. When tables are modified, any relevant entries in the query cache are flushed.
    You could also look at materialized views. Views are stored queries. Materialized views are stored results, generally in a table, for a query. Materialized views have to be refreshed once in a while.

    MySQL doesn't support materialized views, as far as I know, but you can implement the mechanism.


    You'd have to perform a join on the traveling table ( date field and a country ID) to get meaningful data ( date field and a country name ) out of countries table.

  11. #11
    SitePoint Member
    Join Date
    Jul 2012
    Location
    Toronto, Ontario
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You guys are great, thanks! itmitică cahcing would be great! I find databases so interesting, I really have a lot of respect for your work. I'm looking at this book to understand data modelling, relationships and things of the like:
    Database Modeling and Design, Fifth Edition: Logical Design
    http://www.amazon.com/dp/0123820200/ref=cm_sw_su_dp

    I'm hoping I can get a better understanding of architecting a database solution.

    r937, thanks for highlighting the Surrogate/Natural key thing; just to ensure 100% understanding, Natural Key = a number, or, generally, something that's its sole purpose is to identify a record, where as a Surrogate Key is something that is used to identify a record, but is actually part of the data itself?

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by GG09 View Post
    r937, thanks for highlighting the Surrogate/Natural key thing; just to ensure 100% understanding, Natural Key = a number, or, generally, something that's its sole purpose is to identify a record, where as a Surrogate Key is something that is used to identify a record, but is actually part of the data itself?
    nope, it's the other way around
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by GG09 View Post
    I'm hoping I can get a better understanding of architecting a database solution.
    On of the most important aspects is to identify uniqueness constraints.

    Practice has told me, in particular, over and over, I'll never have all the data beforehand.
    Also, there will always be changes that need to be accounted for later on.

    Whatever natural key you may consider for a primary key, it will almost always fall pray to the above real world challenges: missing data, changing data.
    From a pragmatic point of view, surrogate keys will save you a lot of pain in the long run.

    Also.
    The first thing you have to ask yourself before deciding upon a natural key as primary key is if it's immutable.
    There's no if's and's or but's for me on this one.

    A primary key you have to update compares to you changing your DNA halfway through your life.
    It's called primary key for a reason, and it's selected from the heap of candidate keys a table exposes to you.
    If that.

    I'd also say that surrogate keys help me keep a sane level of normalization within my database.

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    a wise man once said it was important to "identify uniqueness constraints"

    to distinguish between surrogate and natural keys is important, in this regard -- no relation has a natural "auto_increment" attribute, so a primary key is not always chosen from a relation's candidate keys, which are by definition natural keys

    if you choose an auto_increment or similar surrogate key for your table, you must take steps to identify one additional key, a natural key, which you will declare with a UNIQUE index or constraint

    and upon this natural key you must perform due diligence, specifically with regard to missing data, changing data

    woe be unto those who forget the need to "identify uniqueness constraints"


    as a followup, ask yourself, does a change to this natural key constitute a new identity?

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    The most important distinction between surrogate keys and natural keys is that surrogate keys are designed to be primary keys while natural keys are the result of normalization: they are unique (candidate keys involving attributes only from the relation) but not all of them will qualify as valid primary keys.

    Most importantly, the future data transformation can easily invalidate a natural key that was chosen as a primary key.

    EDIT: And seeing that the primary key concept never existed in the relations theory (superkeys) but was introduced specifically for developers, surrogate keys fit the profile.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •