SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Normalizing a Big Table (+ Cardinality)

    I now have a series of content management systems with functioning database tables, but I want to try and normalize them a bit. For example, below are the specs for one of my bigger tables, which features information about symbols (national and state flags, songs, birds, etc.).

    PHP Code:
    Field    Type    NULL    Default
                
    NID    int(4)    NO    
    IDArea    varchar
    (6)    NO    
    Area    varchar
    (50)    NO    
    URL    varchar
    (75)    NO    
    URLD    varchar
    (75)    YES    NULL
    AreaParent    varchar
    (40)    YES    NULL
    MyKind    varchar
    (6)    YES    NULL
    Vex    varchar
    (81)    YES    NULL
    Symbol    varchar
    (65)    YES    NULL
    Latin    varchar
    (50)    YES    NULL
    Genus    varchar
    (25)    YES    NULL
    Family    varchar
    (25)    YES    NULL
    Order    varchar
    (25)    YES    NULL
    Class    varchar(25)    YES    NULL
    Desig    varchar
    (255)    YES    NULL
    DesigGeneral    varchar
    (50)    YES    NULL
    DesigGroup    varchar
    (7)    YES    NULL
    ForLan    varchar
    (50)    YES    NULL
    Date    varchar
    (4)    YES    NULL
    Pic    char
    (3)    YES    NULL
    PicBig    char
    (3)    YES    NULL
    SymbolsSort    varchar
    (4)    YES    NULL
    SymbolsSort2    varchar
    (4)    YES    NULL
    SymNotes    text    NO    

    INDEXES    Cardinality
    PRIMARY KEY
    NID 5167        
    IDArea    469        
    Area    469        
    URL    5167        
    MyKind    6        
    URLD    5167        
    Class    77        
    DesigGeneral    74        
    DesigGroup    7 
    Looking at the indexes above, I assume it would make sense to try and break off everything witih a small Cardinality value into a separate table and join to it, right? So, for exmple, I might wind up with a table named MyKind, with just 6 rows. If I split this table into six separate tables joined together, would it likely make a significant difference on performance?

    Also, I'm confused by the term Cardinality. There are 5,156 unique numerals in the first field (Cardinality 5167), and there are just a handful of unique values in field MyKind (Cardinality 6). Therefore, I assume Cardinality refers to the number of unique values in each field.

    However, it also assigns a Cardinality of 5167 to the field URL, even though there are duplicate values in that field. For example, Alabama has two state birds, which are displayed at the same URL: (MySite/World/)Alabama/Bird

    Eventually, I'd like to give each symbol a unique value. The logical field for that would be URL, as it distinguishes between Nebraska's meadowlark (Nebraska/Bird) and Wyomings meadowlark (Wyoming/Bird).

    However, it doesn't distinguish between two birds shared by a single nation or state. Do you think I could assign such URL's different values, like this...

    Alabama/Bird
    Alabama/Bird2

    ...then use some sort of PHP script to fiilter out the 2, so both pages can be previewed at MySite/World/Alabama/Bird ?

    I'll ask about this on the PHP forum, too.

    Thanks.

  2. #2
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    On second thought, it looks like I can't separate some of these fields into separate tables, like MyKind, for example. However, what if converted each value in MyKind to a numeral, replacing every instance of "Home" with 1, for example. Then I could create a simple table with six rows, as follows:

    1 | Home
    2 | Child
    3 | Appendix
    4 | Reference
    5 | Note
    6 | Misc

    Would that improve performance enough to justify creating a new table and joining to it? Thanks.

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    i don't think splitting off the MyKind VARCHAR(6) values into table which has only 6 rows is going to improve anything (in fact, it will add another join to whichever queries use this big table, and, in general, querying from joins is not as efficient as querying from single tables)

    i would examine the redundant interdependencies in some of those columns

    for example, i would guess that Desig, DesigGeneral, and DesigGroup are somehow related, and this structure is almost certainly redundant if carried on every row of this big table
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by geosite
    I'll ask about this on the PHP forum, too.
    please don't, we do not allow cross posting
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    i don't think splitting off the MyKind VARCHAR(6) values into table which has only 6 rows is going to improve anything (in fact, it will add another join to whichever queries use this big table, and, in general, querying from joins is not as efficient as querying from single tables)

    i would examine the redundant interdependencies in some of those columns

    for example, i would guess that Desig, DesigGeneral, and DesigGroup are somehow related, and this structure is almost certainly redundant if carried on every row of this big table
    OK, I kept the field MyKind where it is (with the main table) and simply replaced all the values with single-digit numerals. It seems to work OK so far.

    Thanks.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by geosite
    OK, I kept the field MyKind where it is (with the main table) and simply replaced all the values with single-digit numerals. It seems to work OK so far.
    huh? why?

    what happens when you want to interpret one of the single-digit numerals into a character value?

    isn't that what i said you should not bother doing?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    Non-Member
    Join Date
    Jan 2004
    Location
    Seattle
    Posts
    4,328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    huh? why?

    what happens when you want to interpret one of the single-digit numerals into a character value?

    isn't that what i said you should not bother doing?
    I thought you were suggesting that I not tear this field away and turn it into a new table. Everyone keeps telling me that numerals are so much faster to process, so I just converted the half dozen words in the field to numerical values.

    Actually, I wasn't really displaying data from this field to begin with. Rather, I was just using it to drive my CMS. For examle, the original text consisted of terms like home, intro, child, etc., referring to different types of pages. Then I used various PHP switches to determine which type of page to display and what features it should have. It works just as well with numerals.

    I was able to delete two other fields - names of nations and states and nation-state ID's - which are also listed in one of the tables I joined to. Hopefully, that will speed things up a little bit.

    Thanks.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •