SitePoint Sponsor

User Tag List

Results 1 to 23 of 23
  1. #1
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Normalizing data

    Heya.

    I was just wondering if MySQL handles normalized data without any bother these days? It's been a while since I used MySQL but if I remember back then it didn't handle relationships, foreign keys and so on.

    Also if it does, would there be any decent tutorials to normalize my datasets in MySQL properly?

    Thanks
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    normalized data simply means that there aren't any problems associated with non-key columns in a table dependent on other non-key columns, or on only some of the key columns

    enforcing foreign key relationships is not normalization, it's relational integrity

    mysql myisam tables still don't (enforce relational integrity) but mysql innodb tables do, and have done for quite some time

    check out some of the normalization tutorials on my SQL Links page
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Do you think I should use InnoDB? Not sure about the differences between the two, but I did hear that InnoDB suffered from locking from time to time.

    Let's say I had a couple tables like this:
    Code:
    MEMBER         ARTICLE
    mid (PK)       aid (PK)
    username       title
    password       body
    rank           mid (FK)
    When a member creates an article, and that member is deleted at a later date, does that mean the article will be removed too? Will it give a null value for the mid? What will happen exactly?

    Thanks for replying.
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by neil
    When a member creates an article, and that member is deleted at a later date, does that mean the article will be removed too? Will it give a null value for the mid? What will happen exactly?
    depends

    what do you want to happen?

    note: if you use myisam tables, anything can happen, but the database won't help you, you have to do it all yourself, but if you use innodb tables, you can at least have the database help you, depending again on what you want to do

    does this article help at all? --> Relational Integrity
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    what do you want to happen?
    I'm not sure what I want happen, although I do know what I don't want to happen

    For instance, it will be a shame if an article is lost when a member is deleted from the database. The same would apply for the null value scenerio, it wouldn't look professional if you had say an email link to the author with nothing in it, not even a username... just blank. But I suppose the latter is less important than actually losing the article, if there was a choice between the two. I will be using it in combination with PHP, if that matters any.

    Quote Originally Posted by r937
    does this article help at all? --> Relational Integrity
    Will have a look at this now.
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    sounds like what you want is for the deletion of the author to be prevented if the author has any articles

    and it sounds like you would not want ON DELETE CASCADE in effect, which would delete all of an author's articles along with the author

    if you were to use myisam tables, you could go ahead and delete the author, and it would leave the articles alone, although they would still be linked to the author's record, which now no longer exists, which is exactly the type of situation that relational integrity is supposed to prevent
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    First of all, that's a excellent article, it made some things I wasn't sure about clearer.

    Quote Originally Posted by r937
    sounds like what you want is for the deletion of the author to be prevented if the author has any articles

    and it sounds like you would not want ON DELETE CASCADE in effect, which would delete all of an author's articles along with the author
    Exactly! guess maybe I would want this instead:

    Author Delete: Restrict
    delete author only if no articles have been written by this author
    If you allowed the author delete to happen and left the articles in there, they would all have broken links; Restrict prevents this.
    Quote Originally Posted by r937
    if you were to use myisam tables, you could go ahead and delete the author, and it would leave the articles alone, although they would still be linked to the author's record, which now no longer exists, which is exactly the type of situation that relational integrity is supposed to prevent
    So MyISAM relates to the foreign records, but doesn't hold the integrity part that was explained in the article? InnoDB has these features I assume.

    I'm taking an educated guess that you'd be using InnoDB for most of your database work then rudy? Just begs the question on why would anyone use MyISAM compared to InnoDB that offers more features and greater stability?

    Also, when you mention turning on RI, this is basically just TYPE=InnoDB; at the end of each table creation?
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    i use whatever database my clients want me to use

    for my own web site, i use microsoft access

    anyhow, there's a bit more to it than just declaring type=innodb

    you also need to create indexes on the foreign keys
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    i use whatever database my clients want me to use

    for my own web site, i use microsoft access

    anyhow, there's a bit more to it than just declaring type=innodb

    you also need to create indexes on the foreign keys
    After all this I may go with MyISAM, because firstly I've read somewhere you need 3rd party (commercial) software to do backups while the database is running. Secondly, InnoDB cannot search through text.

    Although you said MyISAM still allows foreign keys to relate between data? I can just code the integrity part using PHP for now, because I kinda need the search and backup features.

    Was surprised when you said MS Access though
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    myisam tables do not support foreign keys at all, you would have to code the integrity checking into your application code

    and hey, access is a great database!! it works just fine for my web site, which doesn't get quite the same volume of traffic as ebay or amazon
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    myisam tables do not support foreign keys at all, you would have to code the integrity checking into your application code
    Awe no, I'm in a muddle again hehe, when you said linked earlier ("although they would still be linked to the author's record"), I thought you meant like a foreign key type of linking, is there any other type of linking?

    Which do you recommend I use? in your own personal view of course.
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    "linked" only in a sense, sorry for the confusion

    i can link two tables on any values i like, and there would be no foreign keys involved

    the application code would control the linking

    if one of your tables is articles, and you want to provide a comprehensive search feature (fulltext indexing), then use myisam tables
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    "linked" only in a sense, sorry for the confusion

    i can link two tables on any values i like, and there would be no foreign keys involved

    the application code would control the linking

    if one of your tables is articles, and you want to provide a comprehensive search feature (fulltext indexing), then use myisam tables
    Thanks again, your advice is always welcome.

    Looking back further up the page, where I indicated mid is part of articles, would this be seen as a waste since they aren't related in the database as a foreign key? what I mean by this is should I now replace mid with name to prevent any unnecessary query calls (in finding out what exactly is mid when collecting data from articles)? therefore changing the normalized data produced.

    There is maybe an easier work around that I'm not seeing at the moment.
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    absolutely not

    mid is the primary key of members, and in articles, it is a foreign key

    whether mysql supports or enforces relational integrity does not detract from the role these columns play in your data

    if you were not to use mid to link those tables, that would be a waste
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just something else I need to know.

    Let's say there was another field in articles that relates to another table:
    Code:
    MEMBER         ARTICLE     CATEGORY
    mid (PK)       aid (PK)    tip (PK)
    username       title       name
    password       body
    rank           mid
                   tip
    Gathering the information I'd need like three different query calls? For example:
    Code:
    SELECT aid, title, body, mid, tip FROM article WHERE aid=$aid
    SELECT username FROM member WHERE mid=$mid (got from first query)
    SELECT name FROM category WHERE tip=$tip (got from first query)
    Wouldn't that be a big strain on servers with you make different calls 3x as much? or is there any shorter method of acheiving it?

    Thanks.
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  16. #16
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    you may need a basic tutorial in how to write a join query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  17. #17
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    JOIN seems like it'll do the trick!

    Have another question for ya Rudy, it's about archiving.

    I would use the articles table outlined previously, but that is a bad example. Lets say it was software instead.
    Code:
    MEMBER         SOFTWARE      CATEGORY
    mid (PK)       sid (PK)      cip (PK)
    username       title         name
    password       description
    rank           version
                   mid
                   cip
                   location
                   filename
    The member wants to update an existing version for one of the software applications. Now obviously the actual file will be changed, possibly put into an archive directory and renamed respectively.

    Although would it be best to add an extra field in the software called archive (ENUM=YES or NO), so when the member clicks update another row is created with the same values as the software being updated, only the archive field is changed to yes and the location would also be changed. Or would having an archive table be more suitable for this task?

    Hopefully you know what I mean.
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  18. #18
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    i don't really mean to be uncooperative, but when are you actually going to try creating a few tables, loading them with data, and running a few queries?

    that would be the fastest way for you to learn

    your ideas about archiving will work either way, same table or separate table, but you will only get a feel for which is better for you by trying it
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  19. #19
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yea I know what you're saying Rudy and I'll be doing all that soon, but I'm currently designing the system before I implement. I knew both would work, just didn't know which one was better for design purposes... like that archive question isn't something you can test on your home PC (server strain using one table, or some other pitfall found later down the road if the website becomes popular, or just generally following good practices).
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  20. #20
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    it's not a question of strain, there is no strain either way if you index porperly

    what i'm talking about is for you to get a feeling about how the sql will be different for all your insert, update, and select queries, whether you have one table or two

    and you can do that on a test database easily

    yes, it's important to design the tables before implementing them, but it's also important to know the effect that a certain table design will have on query complexity
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  21. #21
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Suppose you're right I've been reading up on JOINS, but found a problem, how would you call a key from a key, let me explain:

    member (mid,username,password)
    article (aid,title,body,mid,cid)
    category (cid,name,mcid)
    master_category (mcid,name)

    How would you go about achieving this sort of MySQL query (I've highlighted the problem area with question marks on the last LEFT JOIN statement).

    Code:
    SELECT article.title,
           article.body,
           category.name,
           master_category.name
    FROM article
    LEFT JOIN member ON article.mid = member.mid
    LEFT JOIN category ON article.cid = category.cid
    LEFT JOIN master_category ON ?.? = master_category.mcid
    WHERE (master_category.name = 'technology');
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]

  22. #22
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    instead of your question marks, put category.mcid

    two points:

    when you use a left outer join, and then you put a condition on the right table into the WHERE clause (as you've done with master_category.name = 'technology'), then you will never return unmatched rows from the left table, so this effectively turns your left outer join into an inner join

    secondly, by writing left outer joins you are implying that there can be an article without a matching member, an article without a matching category, and a category without a matching master_category

    time to create some test data and discover these peccadillos first hand
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  23. #23
    Captain Internet neil's Avatar
    Join Date
    Jun 2001
    Location
    n.ireland
    Posts
    333
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    time to create some test data and discover these peccadillos first hand
    Hehe, I will I will... thanks again you've been a big help!
    neil - Geocities, the trailer park of the web
    .<.<.<.<.<.<.<.<
    WEB[retype.net] + [guff.org]
    ICQ[273981] AIM[neiim]


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •