SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Enthusiast
    Join Date
    Feb 2011
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    1 Thread(s)

    noob question, possible to have variable number of fields in a record?

    Hi, I'm a mysql noob, but I'm in a position for a project where I'm hiring devs and making strategic decisions on storing data so I would love any help with this question.


    The data I need to store will have a varying number of key fields per record.
    Some example key / value pairs, I'm assigning 100 for each value as the problem lies with the keys

    IDdata 1 : dataA = 100 , dataB = 100 , dataC = 100
    IDdata 2 : dataA = 100 , dataB = 100 , dataC = 100 , dataC1 = 100 , dataC2 = 100 , dataD = 100
    IDdata 3 : dataA = 100 , dataA1 = 100 , dataA1a = 100 , dataA1b = 100 , dataA2 = 100 , dataB1 = 100
    IDdata 4 : dataA = 100 , dataB = 100 , dataC = 100 , dataD = 100 , dataD1 = 100 , dataD1a = 100

    as you can see just to store the above I would need a table really wide where the column headers cover every single possible variation of key.

    I'm guessing that mysql tables must specify the number of columns when they are created right?

    The solution I am considering is to have a table with just two columns
    IDdata and data string

    Then I could store the key value pairs concatenated in a string with fields separated by colons and semi colons or commas or something.

    When retrieving a record I would instead retrieve the string and parse it. That way if a record went four levels deep on its A group of data and another record only went one level deep but went up to F number of groups it would be cope.

    Please tell me if I am going at this right or wrong with regards to mysql

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by PrimeLens View Post
    I'm guessing that mysql tables must specify the number of columns when they are created right?
    every relational database, not just mysql

    key/value pairs are best stored as key/value pairs (i.e. one pair per row) instead of as a single string mishmash that has to be parsed to be searched

    but key/value pair schemas are ~notoriously~ difficult to pull meaningful data from -- do a search for EAV (entity-attribute-value)

    if at all feasible, you should anticipate all possible keys and declare the table with one column per key, using NULL for the values that do not pertain to a given row
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Enthusiast
    Join Date
    Feb 2011
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    1 Thread(s)
    thanks for answering

    Quote Originally Posted by r937 View Post
    if at all feasible, you should anticipate all possible keys and declare the table with one column per key, using NULL for the values that do not pertain to a given row
    I would like to do that but the nature of this data is that number of fields per record can vary.
    There would be roughly 100+ fields per record spread out over a range of approx 1000 (best guess) possible keys
    No matter what column width I choose there is always the possibility that a record can come along with one field too many.

    What do you think of my solution to have just two fields and parse the string?
    Are there other solutions that you can think of?

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by PrimeLens View Post
    There would be roughly 100+ fields per record spread out over a range of approx 1000 (best guess) possible keys
    okay, a predefined table is infeasible


    Quote Originally Posted by PrimeLens View Post
    What do you think of my solution to have just two fields and parse the string?
    i believe i already commented on it

    it won't scale because every search requires a complete table scan


    Quote Originally Posted by PrimeLens View Post
    Are there other solutions that you can think of?
    EAV
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Enthusiast
    Join Date
    Feb 2011
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    1 Thread(s)
    Thanks !

  6. #6
    SitePoint Addict kduv's Avatar
    Join Date
    May 2012
    Location
    Atlanta, GA
    Posts
    244
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    You may want to consider a non-relational DB like MongoDB or similar. Many of them are schema-less and can have a variable amount of "columns" per record.

  7. #7
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,182
    Mentioned
    16 Post(s)
    Tagged
    4 Thread(s)
    In regards to relational databases EAV would be the *best option given the circumstance. Depending on what needs to be done with the values it *might be best to break out the value column into multiple columns for different data types. So when the value represents a date it can properly be stored as a date at least, same with integers. Storing everything as a string can lead to big problems down the line. This is especially true when it comes to supporting virtual foreign keys, where a value references another table by an surrogate primary key. This really sucks but it sucks even more when you need to cast strings to integers when making a comparison against two tables in a join. How I would set it up would be entirely dependent on the data types which need to be supported. If it is only integer values you are dealing with than I would probably go the route of the value column being an integer type. However, if the data type could be best stored as either a string, integer, date, etc depending on context I would go the route of multiple value columns with proper data types defined.
    The only code I hate more than my own is everyone else's.

  8. #8
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,649
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    EAV is never a good option. Seriously.

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by wwb_99 View Post
    EAV is never a good option. Seriously.
    so your suggestion for this scenario is then... ?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,649
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Hard to understand what the problem really is - IDdata.Col1 isn't exactly descriptive. Would help to know how this thing is getting queried, etc.

    That said, a few surface options would be:
    * Actually fit it into a relational model -- again, no idea what the actual problem here is. But that might be workable.
    * MongoDb or better yet CouchDb. Let documents be documents.
    * Doing some sort of hybrid approach with a set of standard metadata columns and then a serialized XML / Json payload column.

  11. #11
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,311
    Mentioned
    19 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by PrimeLens View Post
    I'm a mysql noob ... varying number of key fields per record ... I would need a table really wide where the column headers cover every single possible variation of key.
    This sounds like a textbook many-to-one relationship. The way you handle this is by creating two tables. The first is the IDdata table:

    IDdata
    ------------------------
    | id | some_other_data |
    ------------------------


    The second table could be called IDdata_attributes, or IDdata_properties, or whatever works for your application's nomenclature.

    IDdata_attributes
    ---------------------------------
    | IDdata_id | attribute | value |
    ---------------------------------
    ^ -------- PK --------- ^


    The IDdata_id field relates an attribute-value back to a specific IDdata record, which lets you have multiple attribute-values related to the same IDdata.

    EDIT: To further illustrate for the OP, here's how your data would look.

    Code:
    IDdata
    ------------------------
    | id | some_other_data |
    ------------------------
    | 1  | IDdata 1        |
    | 2  | IDdata 2        |
    | 3  | IDdata 3        |
    | 4  | IDdata 4        |
    ------------------------
    
    IDdata_attributes
    ---------------------------------
    | IDdata_id | attribute | value |
    ---------------------------------
    | 1         | dataA     | 100   |
    | 1         | dataB     | 100   |
    | 1         | dataC     | 100   |
    | 2         | dataA     | 100   |
    | 2         | dataB     | 100   |
    | 2         | dataC     | 100   |
    | 2         | dataC1    | 100   |
    | 2         | dataC2    | 100   |
    | 2         | dataD     | 100   |
    | 3         | dataA     | 100   |
    | 3         | dataA1    | 100   |
    | 3         | dataA1a   | 100   |
    | 3         | dataA1b   | 100   |
    | 3         | dataA2    | 100   |
    | 3         | dataB1    | 100   |
    | 4         | dataA     | 100   |
    | 4         | dataB     | 100   |
    | 4         | dataC     | 100   |
    | 4         | dataD     | 100   |
    | 4         | dataD1    | 100   |
    | 4         | dataD1a   | 100   |
    ---------------------------------
    "First make it work. Then make it better."

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    jeff, great illustration of the EAV scheme, thanks
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,311
    Mentioned
    19 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by r937 View Post
    jeff, great illustration of the EAV scheme, thanks
    Sure thing. More than anything else, it just seemed like the OP needed a quick into to many-to-one.
    "First make it work. Then make it better."

  14. #14
    SitePoint Enthusiast
    Join Date
    Feb 2011
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    1 Thread(s)
    Just revisiting this thread and saw the most recent posts - Jeff, thats a great solution.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •