SitePoint Sponsor

User Tag List

Results 1 to 22 of 22
  1. #1
    SitePoint Enthusiast konky2000's Avatar
    Join Date
    Mar 2003
    Location
    Oakland
    Posts
    71
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    CMS With Good Data Normalisation?

    I keep checking out various CMS only to wind up disappointed by the general lack of database normalisation in the project.

    I know MovableType it isn't PHP, but here is the most recent example of what seems to be a standard table in these types of projects:

    Code:
      CREATE TABLE `mt_entry` (
        `entry_id` int(11) NOT NULL auto_increment,
        `entry_blog_id` int(11) NOT NULL default '0',
        `entry_status` tinyint(4) NOT NULL default '0',
        `entry_author_id` int(11) NOT NULL default '0',
        `entry_allow_comments` tinyint(4) default NULL,
        `entry_allow_pings` tinyint(4) default NULL,
        `entry_convert_breaks` varchar(30) default NULL,
        `entry_category_id` int(11) default NULL,
        `entry_title` varchar(255) default NULL,
        `entry_excerpt` text,
        `entry_text` text,
        `entry_text_more` text,
        `entry_to_ping_urls` text,
        `entry_pinged_urls` text,
        `entry_keywords` text,
        `entry_tangent_cache` text,
        `entry_created_on` datetime NOT NULL default '0000-00-00 00:00:00',
        `entry_modified_on` timestamp(14) NOT NULL,
        `entry_created_by` int(11) default NULL,
        `entry_modified_by` int(11) default NULL,
        `entry_basename` varchar(50) NOT NULL default '',
        PRIMARY KEY  (`entry_id`),
        KEY `entry_blog_id` (`entry_blog_id`),
        KEY `entry_status` (`entry_status`),
        KEY `entry_author_id` (`entry_author_id`),
        KEY `entry_created_on` (`entry_created_on`),
        KEY `entry_basename` (`entry_basename`)
      ) TYPE=MyISAM;
    My question is ... why do database tables in these projects so often wind up so bloated. I ask this in a PHP forum, because so often these CMS are programmed in PHP.

    Has the PHP programming community decided that normalisation isn't very important?

    Is it because it is harder to write programming when you have to worry about foreign key tables etc?

    Is there a PHP based CMS that also uses fully normalized database tables?
    Konky 2000 Collections - Japanese stickers and floaty pens

  2. #2
    If it aint Dutch it aint much Kilroy's Avatar
    Join Date
    Oct 2003
    Location
    The Netherlands
    Posts
    406
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think developers have to meet deadlines and don't want to worry about these things, that's why. Take a look at eZpublish, though, maybe it has normalised tables: http://ez.no

    I've never actually looked at them, but they say it's a good CMS, so...

  3. #3
    SitePoint Zealot
    Join Date
    Jun 2004
    Location
    Norway - Oslo
    Posts
    198
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think the reason in most cases is that at the start (when not well planned) you dont realize just how much info the table will hold. So you dont see the need to normalize any of the data in it.
    I see this all the time at work, how it has happened on our previous work, and its close to impossible to change things like this after a year of using it.

    With good planning from the start, this can be avoided, but you gotta know what data you will need in the database then, and be very carefull to refactor once you see that it needs changing, and dont put it off for a day or a week.

    Wether or not there is a CMS with normalized databasetables im unsure of, mambo as i've been getting familiar with lately does not have it. They have a "users" table with like 40 fields, including address and "extra_field_x". (wow). my usertables never go above 8 fields.

  4. #4
    SitePoint Enthusiast konky2000's Avatar
    Join Date
    Mar 2003
    Location
    Oakland
    Posts
    71
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Does it matter?

    Has anyone ever come across a situation where their PHP project started to experience a slowdown due to a poorly normalized database?
    Konky 2000 Collections - Japanese stickers and floaty pens

  5. #5
    SitePoint Zealot
    Join Date
    Jun 2004
    Location
    Norway - Oslo
    Posts
    198
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No, havent really seen that yet. But even tho i havent actually seen it, im sure it will over time slow down if the project increase in complexity.
    One thing is that you might end up writing more code than needed and spending more time creating code.
    Another aspect is the fact that your database will in size increase a lot, for most websites db-size is "irrelevant" anyway so you dont really notice it there.
    I mean, at my job we have a forum, with 65k members and nearly 3,9 million posts. It uses less than 4GiB in MySQL space.

  6. #6
    SitePoint Zealot
    Join Date
    Oct 2004
    Location
    Worcester
    Posts
    138
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not a database expert, but I don't see what's so bad about that Movable Type table. Every field appears to me to be valid for that record.

  7. #7
    SitePoint Evangelist Ian R. Gordon's Avatar
    Join Date
    Feb 2004
    Location
    New York
    Posts
    474
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There are 3 stages of Database Normalization and each of them requires the first one be completed fully to move on to the next stage. I think in order for anyone to fully normalize their database to stage 3 would require them to have the database complete, in terms of what type of data it will hold and how that data will be used in the script, in terms of relationship between data.

    Most databases barely make 1st stage, some reach 2nd, very few, that I have seen at least, make it all the way to the 3rd Normalization stage.
    Ian Gordon
    CSS / XHTML / PHP Programmer
    http://www.iangordon.us

  8. #8
    Massimiliano Bruno Giordano sid egg's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,280
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Ian R. Gordon
    There are 3 stages of Database Normalization and each of them requires the first one be completed fully to move on to the next stage. I think in order for anyone to fully normalize their database to stage 3 would require them to have the database complete, in terms of what type of data it will hold and how that data will be used in the script, in terms of relationship between data.

    Most databases barely make 1st stage, some reach 2nd, very few, that I have seen at least, make it all the way to the 3rd Normalization stage.
    ...and those would be?
    GamesLib.com - the slickest, most complete and
    easily navigatible flash games site on the web.

  9. #9
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There are 3 stages of Database Normalization ...
    Normally, BCNF is considered the final stage (so there are 4 NF's). In theory there are even more stages, but they have no real practical use.
    http://www.cs.jcu.edu.au/Subjects/cp.../contents.html

  10. #10
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Ireland
    Posts
    91
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    if you want to get pedantic there is 4th and 5th normalized form as well as BCNF

  11. #11
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    True, there are 4 forms of data normalisation, but I might add for very specific needs of data normalisation, there are a few more forms of normalisation, two if I remember

    EDIT: someone has already beaten me to it

    But as web developer's we only really need to be concerned with the first three. I can't comment on the Movable Type schema since I only see one table, lets have some more and we can tear it apart

  12. #12
    simple tester McGruff's Avatar
    Join Date
    Sep 2003
    Location
    Glasgow
    Posts
    1,690
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sid egg
    ...and those would be?
    http://www.oreilly.com/catalog/javad...apter/ch02.pdf

  13. #13
    SitePoint Evangelist Ian R. Gordon's Avatar
    Join Date
    Feb 2004
    Location
    New York
    Posts
    474
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sid egg
    ...and those would be?
    Well...

    1st Normal Form is:



    • Eliminate duplicative columns from the same table.
    • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
    2nd Normal Form is:

    • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
    • Create relationships between these new tables and their predecessors through the use of foreign keys.
    3rd Normal Form is:

    • Remove columns that are not dependent upon the primary key.
    [There is a 4th Normal Form , I don't know how practical it is for MySQL in most projects though: ]
    • A relation is in 4NF if it has no multi-valued dependencies.
    You can read more about Database Normalization here: http://dev.mysql.com/tech-resources/...alization.html

    Ian Gordon
    CSS / XHTML / PHP Programmer
    http://www.iangordon.us

  14. #14
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I found this useful a few years ago

    http://www.utexas.edu/its/windows/da...m/erintro.html

    Follow the links that are part of the text to get to the next part...

  15. #15
    SitePoint Enthusiast
    Join Date
    Jul 2004
    Location
    Cyberia
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well, how exactly would you normalize above mentioned (MT) code? I'd say it is pretty well normalized (then again, I'm no SQL expert ) Wouldn't "over-normalization" lead to much more complicated, and even slower code? MySQL.com article seems to confirm this:

    The point of the last example is that Normalization is a tradeoff. In fact, there are two additional normal forms that are generally recognized, but are mainly academic. Additional normalization results in more complex JOIN queries, and this can cause a decrease in performance. In some cases performance can even be increased by de-normalizing table data, but de-normalization is beyond the scope of this article.

  16. #16
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by kyberfabrikken
    In theory there are even more stages, but they have no real practical use.
    WTF OMG LOL

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  17. #17
    Snowboarders die even younger igor.kudela's Avatar
    Join Date
    Feb 2005
    Posts
    731
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by konky2000
    Does it matter?

    Has anyone ever come across a situation where their PHP project started to experience a slowdown due to a poorly normalized database?

    oh yes lots of times ,
    1. when we were testing our cms with about million articles in it the most recent
    2. when we had around 140 000 email users and the mail server was set up to log every in/out mail + logins and all that into the DB ( terrible)

    3. when i worked on a service that would sent out bulk emails , 1 000 000 in a row was a big problem for the db

    4. when we were doing a on line chat app scaled for 10 000 online users
    lots of cash and hashtables needed to be dumped into the db ( damn slow response time from the db)

    this was all on MS SQL if i recall right , but after some tweaking it all ran fine

    oh and they were all .NET projects
    Igor Kudela
    NetPublisher - FREE Customizable .NET CMS

  18. #18
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    igor, none of those issues you mention appear to have anything to do with normalization
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  19. #19
    Snowboarders die even younger igor.kudela's Avatar
    Join Date
    Feb 2005
    Posts
    731
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    igor, none of those issues you mention appear to have anything to do with normalization
    how can u now that from what i have written ??

    and if i quote this
    Well...

    1st Normal Form is:


    * Eliminate duplicative columns from the same table.
    * Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

    2nd Normal Form is:

    * Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
    * Create relationships between these new tables and their predecessors through the use of foreign keys.

    3rd Normal Form is:

    * Remove columns that are not dependent upon the primary key.

    [There is a 4th Normal Form , I don't know how practical it is for MySQL in most projects though: ]

    * A relation is in 4NF if it has no multi-valued dependencies.
    than each an every case had to do with normalization
    Igor Kudela
    NetPublisher - FREE Customizable .NET CMS

  20. #20
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    how can i know that? that's easy -- i know what normalization means

    a statement like "when i worked on a service that would sent out bulk emails , 1 000 000 in a row was a big problem for the db" has nothing to do with 1NF, 2NF, 3NF or even BCNF
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  21. #21
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by konky2000

    Code:
          `entry_id` int(11) NOT NULL auto_increment
          `entry_blog_id` int(11) NOT NULL default '0'
          `entry_author_id` int(11) NOT NULL default '0'
          `entry_category_id` int(11) default NULL
          `entry_created_by` int(11) default NULL
          `entry_modified_by` int(11) default NULL
    My question is ... why do database tables in these projects so often wind up so bloated. I ask this in a PHP forum, because so often these CMS are programmed in PHP.
    I see one PK and 5 FKs
    I see that this database is most likely in partial 2NF.

    I think that these are probably non normalized:

    Code:
    	`entry_text` text,
     	`entry_text_more` text,
     	`entry_to_ping_urls` text,
     	`entry_pinged_urls` text,
     	`entry_keywords` text,
    But what do you want to do, normalize the database to the point where you have more tables than attributes?
    What exactly is the problem?
    How would you improve it?

  22. #22
    Snowboarders die even younger igor.kudela's Avatar
    Join Date
    Feb 2005
    Posts
    731
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    how can i know that? that's easy -- i know what normalization means

    a statement like "when i worked on a service that would sent out bulk emails , 1 000 000 in a row was a big problem for the db" has nothing to do with 1NF, 2NF, 3NF or even BCNF
    good for u
    the point here is that u clearly under estimate other people and do not read the posts properly ,
    the question was clear did u have a time where u needed to normalize a db becouse the app was slow
    the list i provided was a list of scenarios that i could remeber just from top of my head

    in this particular scenariou if i remember correctly we had to change the db structure and do some indexing and defing fes keys for better integrity
    the entire application was not as easy as it sound it was a fairly complex app
    written as a service that run agains a ms sql while it had the entire emailing backend rewritten all the way to the encoding types that are commonly used in email system for unicode and attachments and while the fs was faily durable against the loads , after we did some sipmple directory and file hashing the db was the problem as it slow down the app when sending
    Igor Kudela
    NetPublisher - FREE Customizable .NET CMS


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •