SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Identifying Missing Data Fields?

    Hi All,

    I'm hoping someone can offer some insite into this one!

    I'm making a database application that will have several users collect data from various sources and enter it as they gather it. I'm looking for a method of reporting progress on entering all the data ... maybe expressed as a percentage. So, say there are 20 fields in a given row, if 5 are still empty, I'd be able to determine that the row was 75% complete. Another useful tool would be to highlight which fields are still incomplete.

    Example:

    Say I've got a table that contains various data about 100 restaurants in my city. After 2 weeks of data gathering, I'd like to be able to see how close I am to having complete data for (restaurant _a), (restaurant_b), etc each individually, or all 100, as a percentage. Then, I'd like to be able to generate a report that show which data are still missing for each restaurant.

    I was thinking along the lines of COUNTing NULL values or something, but I can't get it to work properly. Or maybe a giant WHERE clause ...

    PHP Code:
    SELECT COUNT(*)
    FROM Table1Table2Table3 // , ... (for all tables)
    WHERE Column1 IS NULL OR Column2 IS NULL OR  Column2 IS NULL // OR ... (for all columns); 
    Anyway, I'm hoping someone else has some experience doing this ...

    Thanks!
    Last edited by s21825; Jan 19, 2005 at 07:47.
    s21825

  2. #2
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, I might have a lead ...

    If I make a query that selects everything, then I use WHERE NOT EXISTS (query that selects completed data) ...

    Am I on the right track here? This is boggling my mind.
    s21825

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,323
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    nope, not on the right track

    one table is correct, but your example says

    ... FROM Table1, Table2, Table3 // , ... (for all tables)

    which isn't right because you're not going to put each restaurant in its own table -- i won't let you

    as far as determining the percentage of attributes that each restaurant has or has not got data for, you would be in a better position to judge how to write that query after you have actually amassed some data, and the first step towards doing that is to decide which attributes you want to record
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply r937!

    I used restaurants as an example because the actual data I'm working with is a bit more complicated to explain. So I won't do anything like store each single restaurant in its own table, don't worry!

    My actual database has five tables that store the data I'm interested in tracking:

    Products
    Components
    ProductComponentLookup
    Substances
    SubstanceComponentLookup

    Each product can contain many components. Each components can potentially be in many products. Each component may have many substances in it, and each substance may exist in any number of components, hence the lookup tables.

    For a given product, I'd like to be able to see which components are missing substance data, which specific data are missing, and what percentage are missing. I'd like to be able to tell, for a given component, what % is complete. Also, for a given product, what % is complete (ie sum up all the components in the product).

    Are there any straight forward methods of making these types of queries?
    s21825

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,323
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    yup, those are straightforward queries -- perhaps a bit more complex than "select * from mytable" but still pretty straightforward
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    yup, those are straightforward queries -- perhaps a bit more complex than "select * from mytable" but still pretty straightforward
    ... I really am just learning SQL itself after spending a lot of time figuring out database design techniques. So I'm hoping for a point in the right direction.

    OK, from what I've seen, COUNT does not actually count NULL values, so should I be placing a default not NULL (and therefore COUNTable) value in all my fields so that I can count them?

    I was able to get a query that lists all the components for which there are no substance data at all with this query:

    PHP Code:
    SELECT pkComponentIDComponentName
    FROM SubstanceLookup
    Components
    WHERE NOT EXISTS
    (SELECT FROM SubstanceLookup WHERE pkSubstanceLookupComponentID pkComponentID); 
    But how do I find the components with some but not all the data?
    s21825

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,323
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by s21825
    OK, from what I've seen, COUNT does not actually count NULL values, so should I be placing a default not NULL (and therefore COUNTable) value in all my fields so that I can count them?
    in a word -- no!

    that would be putting the cart before the horse

    whether a column is NULL or NOT NULL depends entirely on whether the business rules of the application dictate that there should always be a value

    if you want to count rows, use COUNT(*), which ignores NULLs


    I was able to get a query that lists all the components for which there are no substance data at all with this query
    a LEFT JOIN with WHERE ... IS NULL will also do it

    But how do I find the components with some but not all the data?
    which some?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    whether a column is NULL or NOT NULL depends entirely on whether the business rules of the application dictate that there should always be a value
    I the application doesn't require that all the attributes have values, so I currently have unentered data represented as NULL.
    if you want to count rows, use COUNT(*), which ignores NULLs
    I think I see: to count the rows without data, I count all the rows, then all the rows with data and take the difference? But what about counting empty attributes? I need this to get a % of completeness for each component.
    a LEFT JOIN with WHERE ... IS NULL will also do it
    I'll look into that. Thanks!
    which some?
    Well, any really. I'd like to be able to generate a list of components for a given product that still needs more info (either more Substance data, or more straight Component attributes) ... basically anything short of fully complete. I'm hoping I don't have to make a string of 'OR colunm1 IS NULL OR Column2 IS NULL OR ..." for all the columns.

    I hope this makes sense lol!

    Thanks again for the help r937!
    s21825

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,323
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    are you suggesting that "fully complete" means that a product must have every component?

    here are the products that do not have all components --
    Code:
    select distinct 
           P.id
         , P.productname
      from Products as P
    cross
      join Components as C
    left outer
      join ProductComponentLookup as PC
        on P.id = PC.productid
       and C.id = PC.componentid
     where PC.productid is null
    not that the PC table does not contain NULLs, rather, the NULLs come from the way a LEFT OUTER JOIN works, and represents a missing row
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    are you suggesting that "fully complete" means that a product must have every component?
    In my model, a product contains many components, and data will be collected about the components ... not directly about the products. So, to be fully complete, a product must have every data for each of its components. So, each product, when entered into the database, has all its components, just not the data about each component or the data about which substances are in each component.
    Quote Originally Posted by r937
    here are the products that do not have all components --
    Code:
    select distinct 
           P.id
         , P.productname
      from Products as P
    cross
      join Components as C
    left outer
      join ProductComponentLookup as PC
        on P.id = PC.productid
       and C.id = PC.componentid
     where PC.productid is null
    not that the PC table does not contain NULLs, rather, the NULLs come from the way a LEFT OUTER JOIN works, and represents a missing row
    ... maybe, I'm not sure if that's what I want or not. I haven't seen CROSS JOIN or LEFT OUTER JOIN before ... only LEFT JOIN or OUTER JOIN. I'll need to read or experiment a bit to see what they are doing.

    Does this count empty attributes or rows containing empty attributes?
    s21825

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,323
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    no, it finds P rows with PC rows that aren't there

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    no, it finds P rows with PC rows that aren't there

    Great, thanks!

    Well I think I will be able to find Components that are missing substance data now. But, what about finding missing Component attributes? How do I identify and count them?
    s21825

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,323
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    assuming that these component attributes are columns in the component table, you will have to check each one individually
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    assuming that these component attributes are columns in the component table, you will have to check each one individually
    OK, thank you! I was hoping there would be a specific SQL command that would handle doing it all together.

    Thanks again for your help with this!
    s21825


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •