SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Need some advice on reorganizing this database

    My office just received a database from outside the office that I'm going to have to build a web tool around. The tool will be fairly simple - you use a web form to specify criteria (using checkboxes) and when you submit the form you get back a list of matching resources.

    There are four categories of criteria, and each criteria contains a narrowing set of values (ex areas would be: state then country then city then block, etc)

    The problem is that the people who set up this database made a separate table for each criteria, and as such the resources are duplicated. What's worse is that one of the criteria is just a column in all tables, so you can even have resources repeated in the same table.

    Example rows from the areas table.


    resource topic state county city block
    resource 1 topic 1 Y N N N
    resource 1 topic 2 Y Y N N

    etc.

    Normally I would put the resources in a table with resource name and ID (pk), then a table for topics, a table for criteria, etc, then make relational tables to do a many to many relationship, but I'm having trouble getting my head around things. For one, resource 1 can have different area values depending on what topic is selected. Same for the other criteria. Just looking for the best way to organize it.
    <cfset myblog = "http://cydewaze.org/">

  2. #2
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,509
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    The problem is that the people who set up this database made a separate table for each criteria
    I don't quite understand this part. In the data you posted it looks like it's all in 1 table? Can you give an example?

  3. #3
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Yep sure can (I'm always horrible at explaining these things).

    The tool works like this:

    First you select an area of interest (known as "topics" in the database). Examples are: housing, land use, public health, safety.

    Once you do that it returns a list of resources in the chosen topic(s). You can then refine your search with any one of three other set of checkboxes.

    Area (state, county, city, block, etc)
    Setting (rural, suburban, urban, etc)
    Mode (car, bus, rail, walking, etc)

    The current database has separate tables for each one of those options (area, setting, mode) and the resources (most but not all) are duplicated in each of those tables. My example table in the OP is from the Areas table but the other two look the same. And the first set of checkboxes (the topics) are just a column that appears in all tables. It's like someone just imported 3 excel spreadsheets and decided to make a database out of them.

    Here's how the three tables are set up:

    Area
    resource_name (text)
    topic (text)
    state (y,n)
    county (y,n)
    city (y,n)
    town (y,n)
    block (y,n)

    Mode
    resource_name (text)
    topic (text)
    automobile (y,n)
    bus (y,n)
    rail (y,n)
    bicycle (y,n)
    pedestrian (y,n)

    Setting
    resource_name (text)
    topic (text)
    rural (y,n)
    suburban (y,n)
    urban (y,n)
    downtown (y,n)

    I've moved the topic names to their own table to give them their own ID numbers, and I made a resources table to do the same, and so that each resource is only listed once (rather than once in each table). I've also created a relational table to tie each resource to their appropriate topics (many-to-many, as each resource can appear in more than one topic). I started to do the same for the areas, settings, and modes, but the problem is that, for example, Resource 5 when in Topic 1 will have a Y for "city", but Resource 5 in Topic 2 will have a N for "city". It's quite confusing I know, which is why I'm here.
    <cfset myblog = "http://cydewaze.org/">

  4. #4
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,509
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    If you get your data from an external provider, and only have to show it, and not maintain (edit, delete, add) anything, I think these tables should do the job:

    - topics table (contains only topic name)
    - resources table (contains only resource name)
    - ASM table (contains only area/settings/mode) (area-settings-mode, couldn't come up with a good entity name )
    - ASM-options table (contains area/settings/mode and the option name)
    - and finally the table that binds them all, with topic name, resource name, area/settings/mode, option name and option value (Y/N)

    The first four tables are used to populate your dropdowns/checkboxes (you could do that with a select distinct on the last table as well though).
    The last to get the resources that respect the selected criteria.

    No need for id numbers, since you won't be changing any resource, topic or other names anyway.


    You could simply create three tables just like the data you received, and do all select queries on those 3 tables, but that would mean you'd have to change your queries and application every time they decide to add/remove/change an Y/N option column.

  5. #5
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by guido2004 View Post
    - ASM table (contains only area/settings/mode) (area-settings-mode, couldn't come up with a good entity name )
    - ASM-options table (contains area/settings/mode and the option name)
    I'm not really clear on this part. You mean one table for all the area, settings, and mode data? Do you mean like a parent/child table where area, settings, and mode are parents, and the sub-settings have a parent id that links them to their parents?

    Quote Originally Posted by guido2004 View Post
    - and finally the table that binds them all, with topic name, resource name, area/settings/mode, option name and option value (Y/N)
    So like resource id, topic id, and the area/setting/mode id in the same row?

    Quote Originally Posted by guido2004 View Post
    You could simply create three tables just like the data you received, and do all select queries on those 3 tables, but that would mean you'd have to change your queries and application every time they decide to add/remove/change an Y/N option column.
    Well one of the reasons I'm changing this db is to make future changes easier to implement without touching the web app.
    <cfset myblog = "http://cydewaze.org/">

  6. #6
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,509
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by cydewaze View Post
    I'm not really clear on this part. You mean one table for all the area, settings, and mode data? Do you mean like a parent/child table where area, settings, and mode are parents, and the sub-settings have a parent id that links them to their parents?
    - ASM table (contains only area/settings/mode)
    A table (like the topics and resources tables) that only contains the names of whatever-you-call-this-entity. So right now it would contain only three rows:
    Code:
    ASM
    Area
    Settings
    Mode
    - ASM-options table
    I didn't add an Options table, because the options differ for each ASM, so if you choose Area, you wouldn't be able to choose the Settings and Mode options, or at least that wouldn't make much sense. So this table would contain a row for each ASM/option pair:
    Code:
    ASM    Option
    Area   State
    Area   County
    etc...
    And you would retrieve the options for the chosen ASM.

    So like resource id, topic id, and the area/setting/mode id in the same row?
    Yes. But not numeric id's, the plain names:

    Code:
    Topic    Resource     ASM    Option   Optionvalue
    Topic 1  Resource 1   Area   State    Y
    Topic 2  Resource 1   Area   State    N
    etc...

  7. #7
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Oh my! Hmmmmmm. I'll have to get my head around THAT now!

    That final table makes sense but I'm trying to figure out why I need the ASM table if I have ASM-options. Just to populate dropdowns? Seems like I could have a column in ASM-options to flag the "top level" entries just as easily.
    <cfset myblog = "http://cydewaze.org/">

  8. #8
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,509
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by cydewaze View Post
    I'm trying to figure out why I need the ASM table if I have ASM-options. Just to populate dropdowns? Seems like I could have a column in ASM-options to flag the "top level" entries just as easily.
    An extra column? You could do a SELECT DISTINCT ASM FROM ASM
    But then, like I said before, you could do that on the final table as well, for topics, ASM, options and resources, getting rid of all the other tables.

  9. #9
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Thanks guido, you've given me a lot to think about.
    <cfset myblog = "http://cydewaze.org/">

  10. #10
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I spent some time thinking about this, and I think I have my head a bit farther around it.

    Quote Originally Posted by guido2004 View Post
    Code:
    Topic    Resource     ASM    Option   Optionvalue
    Topic 1  Resource 1   Area   State    Y
    Topic 2  Resource 1   Area   State    N
    etc...

    Let's take Resource 1 for an example, and let's also say we're limiting out search to Topic 3. We'd then have this:
    Code:
    Topic    Resource     ASM    Option   Optionvalue
    Topic 3  Resource 1   Area     State     Y
    Topic 3  Resource 1   Mode     Car       Y
    Topic 3  Resource 1   Setting  Rural     Y
    One row for each possible "Y" value, correct?

    Then in the query I'd have:

    Code:
    SELECT......
    
    WHERE ( asm-options.asm = 'Area'
    AND asm-options.option = 'State'
    AND asm-options.optionvalue = 'Y' )
    I foresee a collosal WHERE clause down the road in this (although I suppose I'd just use IN with a list of values)
    <cfset myblog = "http://cydewaze.org/">

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by cydewaze View Post
    Topic Resource ASM Option Optionvalue

    I foresee a collosal WHERE clause down the road in this
    collosal, complex, and inefficient

    google EAV (entity-attribute-value)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    google EAV (entity-attribute-value)
    Ok, just watched a video about, and it makes sense as far as storing the data (because the values that are columns in the current database would be rows in EAV) but I'm not sure how I'd retrieve records using that.setup. Plus, what if you have one option in the options column that is numeric and one that is text? You'd have to use a varchar data type for everything.

    Another disturbing thing I noticed (about the existing database, not the EAV) is that not every resource exists in all three of the old tables. It's such a wreck.
    <cfset myblog = "http://cydewaze.org/">

  13. #13
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    After thinking about this some more, I came up with an idea, and thus the following table structure:

    Resources (id, name)
    Topics (id, name)
    Criteria (id, parent, child)
    RTC (id, resource id, topic id, criteria id)



    The Resources and Topics tables are (should be) self-explanatory.

    The Criteria table is something like this:

    Code:
    id  parent      child
    1   mode        car
    2   mode        rail
    3   mode        bus
    4   setting     rural
    5   setting     urban
    6   setting     suburban
    etc...

    Then the RTC table would tie the resource id, topic id, and criteria id together. I can build my initial set of checkboxes from the topics table, and the subsequent ones out of the criteria table.

    Does anyone think this'll work?

    Edit: Rudy will probably scold me for not making a proper parent/child category table out of criteria, which if course I would do in the finished product
    <cfset myblog = "http://cydewaze.org/">

  14. #14
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    It turns out that my above solution won't work at all, of course. In the existing web app (that's going away and which I have to recreate) when you choose additional options you actually get fewer results. For any given resource, if any of the chosen options is N in the database, that resources is hidden, so there are countless possible options.

    I clearly need to think of something else.
    <cfset myblog = "http://cydewaze.org/">

  15. #15
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Alright, got this to work with the table structure two posts up.

    Resources (id, name)
    Topics (id, name)
    Criteria (id, parent, child)
    RTC (id, resource id, topic id, criteria id)

    The reason it threw me off was because of the way the existing tool (the one mine is replacing) works. In the existing tool, when you select more than one criteria, rather that show you all resources related to either of those criteria, it shows you only resources that belong to both. It's like an AND instead of an OR.

    I'm finding it a little tough to replicate this behavior in a query, so I've done it in my code. But I'm not going to spend too much time on it until I know whether or not the owners of this application even want it to work this way.

    But for now, I have a database that is organized much better than the original one.
    <cfset myblog = "http://cydewaze.org/">


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •