SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Enthusiast
    Join Date
    Nov 2001
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Product keywords: normalise or not?

    I have designed various databases that required keywords to be associated with products.

    I have approached this problem in two different ways:

    1) Using "normalization":
    table1(keywordId, keyword)
    table2(keywordID,ProductId)
    table3(productId, productname, etc)
    ...which, ofcourse, prevents duplicate keywords.

    2) "One table" approach:
    table (productId, productName, keywords, etc...) where the 'keywords' field is a varchar that stores multiple keywords delimited by a comma.

    Senior developers I know are split in their views for which method is best. Advocates of method 1 say that it is the 'by the book' way of doing things. Advocates of method 2 say that normalizing keywords only unnecessarily slows down the all important "product search" (due to joins in the query).

    I am still confused about which is the better of the two. Any thoughts??? (Oh, I am using SQL Server for a web based app, so performance is of paramount significance)

  2. #2
    ALT.NET - because we need it silver trophybronze trophy dhtmlgod's Avatar
    Join Date
    Jul 2001
    Location
    Scotland
    Posts
    4,836
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would go for method 1, normaliation


  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    the normalized structure does not slow down "product search" because this involves searching on product attributes in table 3, which would be indexed -- and a join is required only if you want to list the keywords for the products (i.e. the join does not join all possible products, only those found)

    on the other hand, the denormalized structure unnecessarily slows down "keyword search" -- with a comma-delimited list in that table, the index on the varchar keywords field becomes unwieldy if not totally unusable, resulting in a complete scan of the entire table to find products for a particular keyword

    note that if 99% of your information requirements are product search (disregarding keywords) and 1% keyword search, then yeah, maybe the company can afford to have the keyword search totally bog down the machine on the rare occasions it's run, but i betcha that's not the case

    the so-called senior developers advocating method 2 are blowing smoke

    if you're running sql/server, where's the senior dba? he or she will laugh them out of the office
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Enthusiast
    Join Date
    Nov 2001
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks r937 for your explanation.

  5. #5
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I personally wouldn't use a comma delimited list for actual searching.

    My keyword table would look like (in mysql):

    Code:
    CREATE TABLE word (
      productid int unsigned,
      word char(50) NOT NULL default '',
      UNIQUE KEY keyword_index (productid, word)
    ) TYPE=MyISAM PACK_KEYS=1;
    An example query...
    Code:
    SELECT * FROM product LEFT JOIN word AS w ON (productid) WHERE w.word IN (keyword1,keyword2,keyword3,...)
    If you want to handle English mis-spellings then store your keywords in Metaphone format and run your search keywords through a Metaphone function before processing.
    Last edited by Hierophant; Jul 12, 2002 at 07:48.
    Wayne Luke
    ------------


  6. #6
    SitePoint Enthusiast
    Join Date
    Nov 2001
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wayne,
    But what about the problem of having duplicate words. I.e what if product 1 has the keyword "food" and product 2 also has the keyword "food"?

    I guess this may not be a prob because keyword can be seen as a Key, yeah?

  7. #7
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    One advantage, which may be good to keep in mind, is that if you smack them all into one column and have full-text search indexing capability it can full-text index the column and allow you to do interesting things with searching.

    I'd still go with the multi-row (first) method.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •