SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)

    Multi-dimensional array to Database schema

    OK, I've been collecting data in a multi-dimensional array. The array is getting quite large now, and it might be nice to put the data online so others could use it too. Before it gets any more difficult to maintain I'm thinking it's about time to get this into a database.

    I'll eventually be doing this with PHP and MySQL, but I don't think that's important at this point, I should be able to do OK once I figure out how to get started, and I'm sure I'll be able to import the data into tables, but ....
    I can't figure out how to determine what table relationships to use.

    I have 3 types of data. By far, the vast majority of data sets are a single Member ID with a single IP and a single URL. If this was the case for all of the data sets I would have little to no problem. Unfortunately there are a few "troublemakers".

    MemberIds
    and each has at least one IP - but may have more
    and may have none - or several - URLs

    IPs can be associated with one or more MemberIds
    and none or several URLs

    URLs can be associated with one or more MemberIds
    and one or several IPs

    At some point I may want to add a "comment" type.

    I'm not looking for the finished answer, but I would like to be pointed to something that explains what logic to use in determining how relationships dictate table structure, or if someone's up to it, their own explanation. What I'm looking for is a kind of "logic kick-start".

    Thanks.

  2. #2
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,411
    Mentioned
    149 Post(s)
    Tagged
    4 Thread(s)
    Assuming you have more member data than just the id (member name for example), the first table needed would be 'members'

    Then at first sight it would be logical to say that you'd need 'memberips' and 'memberurls' tables (1 to many relationship).

    But, if you have more info about each IP and url (wouldn't know what, but who knows), then you might need the tables 'ips' and 'urls' to store that info, and then the 'memberips' and 'memberurls' would become the connection between the 'members' and the 'ips' and 'urls' tables (many to many relationship).

    The really complicating factor (IMO) is the relation between ips and urls. You could make a 'ipurl' table, but that would only work if the relations between ip and url are completely independent. And somehow I have the feeling that they aren't, that the relationship between ip and url depends also on their relationship with member. Right?

    Maybe some more info about the data, its meaning and its purpose would be helpful?

    And where would you want to add that comment type? And what do you mean by comment type?

  3. #3
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    Simplified, the array looks like:

    [0]
    12345
    -- 1.2.3.4
    -- sitepoint

    [1]
    12348
    -- 3.4.5.6
    -- google

    [2]
    12356
    -- 3.4.7.9
    -- apache
    -- php

    [3]
    12362
    -- 4.5.6.7

    [4]
    23467
    -- 5.6.7.8
    -- sitepoint (same as 1)

    [5]
    23469
    -- 5.6.8.9
    -- 1.2.3.4 (same as 1)
    -- mysql

    [6]
    23567
    -- 6.7.8.9
    -- 3.4.7.9 (same as 3)
    -- apache (same as 3)
    -- google (same as 2)

    So tables might look like:

    MemberId table
    0 - 12345
    1 - 12348
    2 - 12356
    3 - 12362
    4 - 23467
    5 - 23469
    6 - 23567

    IP table
    0 - 1.2.3.4
    1 - 3.4.5.6
    2 - 3.4.7.9
    3 - 4.5.6.7
    4 - 5.6.7.8
    5 - 5.6.8.9
    5 - 1.2.3.4
    6 - 6.7.8.9
    6 - 3.4.7.9

    URL table
    0 - sitepoint
    1 - google
    2 - apache
    2 - php
    4 - sitepoint
    5 - mysql
    6 - apache
    6 - google

    But that doesn't feel right to me. And because MemberIds only ever occur once, I get the feeling I could be using them to advantage somehow.

  4. #4
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,411
    Mentioned
    149 Post(s)
    Tagged
    4 Thread(s)
    First of all, you don't put duplicate values in the ip and url tables. That would defeat the purpose of those tables (take away data redundancy).

    Second, I don't know how much you've simplified your data, but if there's no data related to the memberid (like membername, password, etc) but the ip numbers and the urls, then two tables would be enough:

    the 'ip' table with two columns:
    memberid
    ip

    the 'url' table with two columns
    memberid
    url

    Or even just one table with three columns:
    memberid
    typeofdata ('ip' or 'url')
    data

  5. #5
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    You mean substitute the MemberIds for the array keys in my previous post's example?

    Might it be better to keep the IPs and URLs unique like:

    IP table
    1.2.3.4 - 12345 - 23469
    3.4.5.6 - 12348
    3.4.7.9 - 12356 - 23567
    4.5.6.7 - 12362
    5.6.7.8 - 23467
    5.6.8.9 - 23469
    6.7.8.9 - 23567


    URL table
    sitepoint - 12345 - 23467
    google - 12348 - 23567
    apache - 12356 - 23567
    php - 12356
    mysql - 23469

    I'd like to be able to know the MemberId, but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.

  6. #6
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,411
    Mentioned
    149 Post(s)
    Tagged
    4 Thread(s)
    I guess what you'd need to do is study a bit of database normalization. Google for it, or maybe someone already has a link to a valid resource by hand (I don't).

    If you want to put your data in a database, don't create repetitive columns (memberid1, memberid2). It limits the number of possible memberids linked to a url or ip. And it gets real difficult retrieving the data.
    If you build the database correctly (normalized) the code needed to manage it will be minimal.

    Of course, everything I'm writing here is based on the scarce info you posted and a lot of assumptions, but you should be able to put all data in the three column table (like I said before):

    memberid
    typeofdata ('ip' or 'url')
    data

    where memberid would be the values 12345 etc.

    but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.
    What are the 'connected values'? Aren't they the ip and/or url values connected to each memberid? Why do you want to group them by ip/url value? They aren't grouped like that in your array.

    That's why I asked for more info about the data, its meaning and its purpose. It's not easy to give a meaningful answer based on so little info.

  7. #7
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    OK, I'll take a look for "normalization" and see if I can find anything that can work it's way through my thick skull

  8. #8
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    Well, I did some studying re Normalization. I then took some time away from it hoping it might help me get my head around it, but I'm afraid I'm still rather clueless.

    I was reluctant to divulge my intentions, but what the hey.

    I'm developing an anti-SPAM tool for Mods of a popular forum

    Even if I leave out IPs (low ROI for higher overhead) and Comments, I'm still at a loss determining how to not have rows with unused fields (up to around a dozen of them per row). eg.

    Member_1 - URL_1 - URL_2 - URL_3
    Member_2 - URL_1 - NULL - NULL
    Member_3 - URL_1 - URL_2 - NULL
    Member_4 - URL_1 - NULL - NULL
    Member_5 - URL_1 - NULL - NULL

    or

    URL_1 - Member_1 - Member_2 - Member_3
    URL_2 - Member_1 - NULL - NULL
    URL_3 - Member_1 - Member_2 - NULL
    URL_4 - Member_1 - NULL - NULL
    URL_5 - Member_1 - NULL - NULL

    and Id rather not do 2 fields with comma delimiter

    Member_1 - URL_1, URL_2, URL_3
    Member_2 - URL_1
    Member_3 - URL_1, URL_2
    Member_4 - URL_1
    Member_5 - URL_1

    or

    URL_1 - Member_1, Member_2, Member_3
    URL_2 - Member_1
    URL_3 - Member_1, Member_2
    URL_4 - Member_1
    URL_5 - Member_1

    Am I over-thinking this or am I as clueless as I feel?

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,021
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    your rows should have member and url only, i.e. one of each

    if a member has two urls, there are two rows, and so on

    that's properly normalized

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    So something like

    AI_0 - Member_1 - URL_1
    AI_1 - Member_1 - URL_2
    AI_2 - Member_1 - URL_3
    AI_3 - Member_2 - URL_1
    AI_4 - Member_3 - URL_1
    AI_5 - Member_3 - URL_2
    AI_6 - Member_4 - URL_1
    AI_7 - Member_5 - URL_1

    I guess what was hanging me up was thinking I could use the unique memberIds/URLs for keys instead of an auto_increment.

    Thanks for the help

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,021
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    what are the AI thingies? auto_increments? you do not need them
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    It might be because every example I've ever seen seems to have dratted auto_increment Id keys, but I thought (hmmm, now that I think of it, why?) that a row needed a key, and the key needed to be unique.

    I've missed something basic since Wayback.

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,021
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    you're right, every table should have a primary key
    Code:
    CREATE TABLE member_urls
    ( member_id INTEGER NOT NULL 
    , url_id    INTEGER NOT NULL 
    , PRIMARY KEY ( member_id , url_id )
    );
    or alternatively
    Code:
    CREATE TABLE member_urls
    ( member_id INTEGER NOT NULL 
    , url   VARCHAR(123) NOT NULL 
    , PRIMARY KEY ( member_id , url )
    );
    simple, innit

    (note: foreign keys omitted for brevity)
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,492
    Mentioned
    164 Post(s)
    Tagged
    1 Thread(s)
    Ahh, the light comes on.

    Using both as the key would be unique key values.

    I've never seen a CREATE like that, but that's my fault for using the PHP documentation as a tutor. They're great for the specific function they're dealing with but they often short-cut other parts of the code example to simplify it.

    It is simple

    Kind of like how I can miss seeing something in the store when it's on the shelf in front at eye level but will spot it when it's top/bottom or behind


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •