SitePoint Sponsor

User Tag List

Page 2 of 2 FirstFirst 12
Results 26 to 41 of 41
  1. #26
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by oddz View Post
    There won't be any type of performance lose because in the last I'm using a string as a primary key?
    performance loss? unlikely that you'd be able to measure it

    there is, if course, a small increase in total space required, because you'll use a VARCHAR(15) column as your foreign key in all tables that have a user name (rather than a 4-byte integer)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  2. #27
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Cups View Post
    Even if i then had transit systems to add to the mix.

    id - bloc - country - city - transit
    ==========================
    1 - EU - France - Paris - bus
    2 - EU - France - Paris - train
    3 - EU - France - Paris - metro
    4 - EU - France - Paris - taxi

    This single row schema still works.

    So the critera you stated was:

    the idea of functional dependency is that if i give you a value for the primary key, can you unambiguously tell me the value of any column in that row
    In which situation does that not apply then?

    Can you give me an example?
    Hi,

    I've been dwelling upon this. When do you break out a second table? (transit)

    By adding transit to the table, I haven't stopped functional dependency - clearly by selecting row4 I can tell you the value of City and give you a transit type.

    But what I have done is introduce a lot of redundancy ( or repetition, is that the correct term?)

    I don't think I have 'broken' FD, but I have broken some other rule, if so what rule is that?

    Quote Originally Posted by r937
    FDs are relationships between column values
    Is it the case that transit carries more than one option for each row, whereas bloc - country - city are unique? or because their relationship is 1-1-1

    bloc - country - city - transit are unique too but their column relationship is 1-1-1-n - so then you could break out a 'transit' table, maybe depending on how many options there were for transit, you might decide to live with the redundancy - just for ease of use?

    But if you had to add "language spoken in this city"
    bloc - country - city - transit - languages_spoken 1-1-1-n-n then you should surely break out to other tables otherwise you are into this kind of madness;

    id - bloc - country - city - transit - language_spoken
    =======================================
    1 - EU - France - Paris - bus - French
    2 - EU - France - Paris - train - French
    3 - EU - France - Paris - metro - French
    4 - EU - France - Paris - bus - Arab
    5 - EU - France - Paris - train - Arab
    6 - EU - France - Paris - metro - Arab
    7 - EU - France - Paris - bus - English

  3. #28
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    you are stretching the flexibility of the single table beyond its breaking point

    i would ask you to say in words what the id represents -- since the id is the PK, you should be able to say "each id represents a different _____"

    back when it was bloc-country-city, each id represented a different city

    the city belonged to only one country, and the country belonged to only one bloc

    a hierarchy

    but with transit and language spoken added, what does each id represent?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #29
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Exactly, its only by pushing it over the edge that I can then clearly define what a single table can contain.

    When I only had 'transit' added i could see no difference,

    each id represents a transit system in a city

    When I added 'languages spoken in the city' I could no longer easily say the sentence which the row described.

    So, is that the test of FD? That it contains only attributes which describe, or lead to a single entity?

    Because in fact this table does not need an id

    Id - bloc - country - city
    ================
    1 - EU - France - Paris

    The city is the key - till you get to Paris, Texas of course - ahhh... erm?

  5. #30
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    In light of what you said I read this again.

    Quote Originally Posted by wikipedia FD
    Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X → Y) if and only if each X value is associated with precisely one Y value. Customarily we call X the determinant set and Y the dependent attribute.
    So in this row;

    Id - bloc - country - city
    ================
    1 - EU - France - Paris

    The id is the determinate set and the city is the dependent attribute

    1 == Paris, and the attributes bloc and country don't break FD

    Id - bloc - country - city
    ================
    1 - EU - France - Paris
    21 - Americas - US - Paris

    in fact they can provide uniqueness.

  6. #31
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    yes, so in your last example, where you have --

    Id - bloc - country - city
    ================
    1 - EU - France - Paris
    21 - Americas - US - Paris

    if we took away the id (which is a surrogate key) we'd be left with --

    bloc - country - city
    =============
    EU - France - Paris
    Americas - US - Paris

    and what do you suppose the Primary Key for this table would be?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #32
    SitePoint Evangelist simshaun's Avatar
    Join Date
    Apr 2008
    Location
    North Carolina
    Posts
    438
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is Id and City not a composite key?

    Id is the simple key.
    city is not a simple key, but still contributes to making the row unique.

    If this is the case, essentially we end up back at having Id as the primary key.

    Right? I think I may be mis-understanding this.

  8. #33
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    I think city is the natural key, but seeing as there are 2 places (not cities, I know) called Paris that breaks the natural key.

    I think country and city end up being the candidate keys, I think thats the correct term and, so somehow their combination ( or their index? ) becomes the unique key.

    Is that correct?

  9. #34
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Actually, my selects would still work as long as you were coming down the hierarchy,

    show blocs, pick a bloc, show countries, pick a country show cities.

    Where it falls down is when you start at the bottom : "I am in Paris, what country is it?" - or as you said previously:
    the idea of functional dependency is that if i give you a value for the primary key, can you unambiguously tell me the value of any column in that row
    Off Topic:

    Sorry about all the posts, as soon as I make a post I fiddle around and find something else out.

  10. #35
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cups View Post
    ... somehow their combination ( or their index? ) becomes the unique key.
    close

    in this example --

    bloc - country - city
    =============
    EU - France - Paris
    Americas - US - Paris

    the primary key is (bloc,country,city)

    each combination is unique, and serves very nicely as a composite primary key

    make sense?

    now, you might argue that since every country belongs to only one bloc, the primary key might also be simply (country,city)

    this is where the "real world" (pun intended) example does not extend to the general solution

    the general solution for three levels would allow two different blocs (top level) to have two countries (second level) with the same name, in a manner similar to two different countries (2nd level) having two cities (3rd level) with the same name
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #36
    SitePoint Evangelist simshaun's Avatar
    Join Date
    Apr 2008
    Location
    North Carolina
    Posts
    438
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Cups View Post
    Where it falls down is when you start at the bottom : "I am in Paris, what country is it?" - or as you said previously:
    To get to this point, you would have to drill down in the form, so your URL might like "/bloc/country/city/".
    You wouldn't need an ID number because a user could:
    - Go to "EU/" to retrieve a list of countries.
    - Go to "EU/France/" to retrieve a list of cities.
    - Go to "EU/France/Paris/" to retrieve city details.

    Alternatively, to work your way UP in the hierarchy:
    - If a city (Paris) has more than 1 parent, user is provided with a drop-down with possible choices (France, US). Upon selection, the Bloc can be automagically found by the database.
    Code:
    SELECT bloc FROM places WHERE country='France' AND city='Paris'
    (URL becomes "Paris/France/EU/" or "Paris/US/Americas")

    There is no reason at all to have an ID. (I think you've already said this and I've wasted my time, but maybe not.)
    I'm open to having some reasoning injected into my brain, so if my logic is incorrect, tell me.

  12. #37
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    simshaun, you've described the situation really well

    actually, there could be a reason to use a surrogate key (such as the id) for this table -- which is a city table, yes?

    this would be if the city had a child table -- city attractions, for example

    the relationship would be one-to-many (each city has multiple attractions), therefore the attractions table would need to carry the foreign key to the city's primary key

    and of course, if the city's primary key is a 3-column composite, then so is the foreign key

    however, if (bloc,country,city) were declared a unique key, with id as the surrogate primary key, then the attractions table's foreign key could be just an integer

    most programmers are so enamored of the obvious neatness of this approach, that this technique (using a surrogate key for a multi-column primary key) seems to have morphed in some people's minds to a general strategy of using surrogate auto_increment primary keys willy-nilly all over the place

    my advice to anyone doing data modelling is to do it all, completely, the entire application, using only natural keys

    then, once you're past the logical modelling stage, and you're sure you have all the relationships properly identified as to uniqueness, etc., only then should you go back and substitute surrogate keys in certain scenarios
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #38
    SitePoint Evangelist
    Join Date
    Jun 2006
    Location
    Wigan, Lancashire. UK
    Posts
    523
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Cups View Post
    Where it falls down is when you start at the bottom : "I am in Paris, what country is it?"
    Of course this is where the hierarchical view of the data suddenly comes into its own, where you can work up or down the tree from any start point within the hierarchy.

    As simshaun has said, working up the tree might give you multiple "Paris/France/EU/" or "Paris/US/Americas"... but the user can then be offered the choice of which they want; and the hierarchy path could easily be reversed within code logic to give the top-down URL.
    ---
    Development Projects:
    PHPExcel
    PHPPowerPoint

  14. #39
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by r937
    in this example --

    bloc - country - city
    =============
    EU - France - Paris
    Americas - US - Paris

    the primary key is (bloc,country,city)

    each combination is unique, and serves very nicely as a composite primary key
    To set this up in my dbase would just be a matter of selecting those 3 columns and making the their combination unique?

    Does mysql automatically create and use that index?

    But in this case, to unambiguously retrieve a single row I'd need all 3 values to be passed to me - is that correct?

    If I need to know all 3 values to get the correct row out of the database, then the composite key is only there to speed up the search.

    Whilst I take the point about using the 3 fields as the composite key, in this particular case country-city would suffice because I don't know of a sovereign country which straddles 2 blocs, AND, recalling the mod_rewrite quandry I can excuse myself from writing /bloc/country/city can't I?

    Just getting /country/city will bring back the correct row.

    Will the decision about which columns to use a composite primary key always depend on the application, the relationship between the datas, and the hierarchy?

    So lets see stick my neck out and see if I am grasping this.

    When the relationship between columns is 1:1 when storing hierarchical data, and you can identify each row by a natural key, there is simply no need to use a surrogate primary key.

    e.g.
    animal - canine - dog - husky

    If the natural key can legitimately be duplicated, then look through the candidate keys to find a composite primary key.

    EU - France - Paris
    Americas - US - Paris

    Or, Americas - US - Paris

    If the table has child tables, then you will likely benefit from creating a surrogate primary key.

    cities
    ====
    1 - EU - France - Paris
    2 - Americas - US - Paris

    transit_systems
    ============
    1 - Bus
    2 - Tram
    3 - Rail
    4 - Metro

    city_transit
    ========
    1 - 1
    1 - 2
    1 - 3
    1 - 4

    I am in no way discarding what Mark Brown is advocating re: using the hierarchical view, I am just trying to start by nailing what FD is and is not, and I feel this is an important stepping stone on the way to grokking the hierarchical view.

    Am I on the right track?

    To address in part this point of mine:

    Quote Originally Posted by Cups
    Actually, my selects would still work as long as you were coming down the hierarchy,

    show blocs, pick a bloc, show countries, pick a country show cities.
    Quote Originally Posted by simshaun
    Alternatively, to work your way UP in the hierarchy:
    - If a city (Paris) has more than 1 parent, user is provided with a drop-down with possible choices (France, US). Upon selection, the Bloc can be automagically found by the database.
    Quote Originally Posted by mark baker
    As simshaun has said, working up the tree might give you multiple "Paris/France/EU/" or "Paris/US/Americas"... but the user can then be offered the choice of which they want; and the hierarchy path could easily be reversed within code logic to give the top-down URL.
    If you go to wikipedia and search for "pdf" you will end up on a device called a "disambiguation" screen.

    Did you mean?
    ===========
    PDF Portable Document Format
    PDF Panamanian Defence Forces
    ... and others...

    So that gives you a choice when identifying the row, either you provide your script with all the unique identifiers (bloc, country and city in this case) or you put in an extra "disambiguation screen".

    So when I apply what I have learned so far, and apply it back to the PHP-mod_rewrite scenario I imagine this (sticking with cities table).

    mod_rewrite rule like
    if /bloc/EU gets cities.php?bloc=EU
    To get all countries in a bloc:
    "select unique country from cities where bloc=EU"
    creates urls like
    <a href=/country/France>France</a>

    mod_rewrite rule like
    if /country/France gets cities.php?country=France
    To get all cities in a country:
    "select city from cities where country=France"
    creates a list like ( or links if there are child tables )
    Paris
    Lyon

    But, in our Paris, Texas (non-ambiguous natural keys) scenario you cannot have then have links like this;

    /city/Paris

    It would have to be

    /EU/France/Paris

    or

    /France/Paris

    or else,

    /city/Paris which would have to be prepared to display as a disambiguation screen

    ( and for that matter what urls does the disambiguation screen lead to?)

    Thats not hard to do
    PHP Code:
    If ( count$rows ) > ){
    // show disambiguation screen


    That's back to semantics though, is /city/Paris better than /EU/France/Paris - sounds more like a "has a" vs "is a" argument.

    Another usability benefit dawned on me, you don't have redundant country names, countries only appear in your application because there is at least one city in it.

    In traditional normalized tables it can be very easy to end up with orphaned entries unless you are well into your referential integrity (RI) which means using innoDB tables only as I understand it.

  15. #40
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cups View Post
    <snip>a lot of stuff</snip>
    yes!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #41
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    yes!!
    *Groans* I thought so.

    I spent most of the last 4 weeks building an application based around a taxonomy which i now know is just hierarchical data.

    I did what I usually do, split all the columns into their own tables and created surrogate keys - which has resulted in a multiple table, harder to fathom application than it should have.

    So, yeah I know this may be quite a meandering and hard to follow thread for some - mostly because of my deadly "stream of consciousness" posts and I apologise for that - but it has certainly turned my thinking on its head when dealing with certain kinds of tables.

    Thanks for everyones input in trying to help me understand what Functional Dependency (FD) is and its importance in database schema design.

    I'm glad this stayed in the PHP forum, because one lesson I learned is that favouring natural keys, or even composite primary keys could severely reduce complexity in the parts of your application that derive from it.

    Paul


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •