SitePoint Sponsor |
|
User Tag List
Results 1 to 14 of 14
Thread: Identifying Missing Data Fields?
-
Jan 18, 2005, 20:43 #1
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Identifying Missing Data Fields?
Hi All,
I'm hoping someone can offer some insite into this one!
I'm making a database application that will have several users collect data from various sources and enter it as they gather it. I'm looking for a method of reporting progress on entering all the data ... maybe expressed as a percentage. So, say there are 20 fields in a given row, if 5 are still empty, I'd be able to determine that the row was 75% complete. Another useful tool would be to highlight which fields are still incomplete.
Example:
Say I've got a table that contains various data about 100 restaurants in my city. After 2 weeks of data gathering, I'd like to be able to see how close I am to having complete data for (restaurant _a), (restaurant_b), etc each individually, or all 100, as a percentage. Then, I'd like to be able to generate a report that show which data are still missing for each restaurant.
I was thinking along the lines of COUNTing NULL values or something, but I can't get it to work properly. Or maybe a giant WHERE clause ...
PHP Code:SELECT COUNT(*)
FROM Table1, Table2, Table3 // , ... (for all tables)
WHERE Column1 IS NULL OR Column2 IS NULL OR Column2 IS NULL // OR ... (for all columns);
Thanks!Last edited by s21825; Jan 19, 2005 at 07:47.
s21825
-
Jan 19, 2005, 15:58 #2
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
OK, I might have a lead ...
If I make a query that selects everything, then I use WHERE NOT EXISTS (query that selects completed data) ...
Am I on the right track here? This is boggling my mind.s21825
-
Jan 19, 2005, 16:45 #3
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
nope, not on the right track
one table is correct, but your example says
... FROM Table1, Table2, Table3 // , ... (for all tables)
which isn't right because you're not going to put each restaurant in its own table -- i won't let you
as far as determining the percentage of attributes that each restaurant has or has not got data for, you would be in a better position to judge how to write that query after you have actually amassed some data, and the first step towards doing that is to decide which attributes you want to record
-
Jan 20, 2005, 11:35 #4
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thanks for the reply r937!
I used restaurants as an example because the actual data I'm working with is a bit more complicated to explain. So I won't do anything like store each single restaurant in its own table, don't worry!
My actual database has five tables that store the data I'm interested in tracking:
Products
Components
ProductComponentLookup
Substances
SubstanceComponentLookup
Each product can contain many components. Each components can potentially be in many products. Each component may have many substances in it, and each substance may exist in any number of components, hence the lookup tables.
For a given product, I'd like to be able to see which components are missing substance data, which specific data are missing, and what percentage are missing. I'd like to be able to tell, for a given component, what % is complete. Also, for a given product, what % is complete (ie sum up all the components in the product).
Are there any straight forward methods of making these types of queries?s21825
-
Jan 20, 2005, 13:31 #5
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
yup, those are straightforward queries -- perhaps a bit more complex than "select * from mytable" but still pretty straightforward
-
Jan 21, 2005, 07:47 #6
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by r937
... I really am just learning SQL itself after spending a lot of time figuring out database design techniques. So I'm hoping for a point in the right direction.
OK, from what I've seen, COUNT does not actually count NULL values, so should I be placing a default not NULL (and therefore COUNTable) value in all my fields so that I can count them?
I was able to get a query that lists all the components for which there are no substance data at all with this query:
PHP Code:SELECT pkComponentID, ComponentName
FROM SubstanceLookup, Components
WHERE NOT EXISTS
(SELECT * FROM SubstanceLookup WHERE pkSubstanceLookupComponentID = pkComponentID);
s21825
-
Jan 21, 2005, 08:05 #7
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
Originally Posted by s21825
that would be putting the cart before the horse
whether a column is NULL or NOT NULL depends entirely on whether the business rules of the application dictate that there should always be a value
if you want to count rows, use COUNT(*), which ignores NULLs
I was able to get a query that lists all the components for which there are no substance data at all with this query
But how do I find the components with some but not all the data?
-
Jan 21, 2005, 08:23 #8
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
whether a column is NULL or NOT NULL depends entirely on whether the business rules of the application dictate that there should always be a value
if you want to count rows, use COUNT(*), which ignores NULLs
a LEFT JOIN with WHERE ... IS NULL will also do it
which some?
I hope this makes sense lol!
Thanks again for the help r937!s21825
-
Jan 21, 2005, 08:39 #9
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
are you suggesting that "fully complete" means that a product must have every component?
here are the products that do not have all components --Code:select distinct P.id , P.productname from Products as P cross join Components as C left outer join ProductComponentLookup as PC on P.id = PC.productid and C.id = PC.componentid where PC.productid is null
-
Jan 21, 2005, 09:02 #10
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by r937
Originally Posted by r937
... maybe, I'm not sure if that's what I want or not. I haven't seen CROSS JOIN or LEFT OUTER JOIN before ... only LEFT JOIN or OUTER JOIN. I'll need to read or experiment a bit to see what they are doing.
Does this count empty attributes or rows containing empty attributes?s21825
-
Jan 21, 2005, 09:59 #11
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
no, it finds P rows with PC rows that aren't there
-
Jan 21, 2005, 10:07 #12
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by r937
Well I think I will be able to find Components that are missing substance data now. But, what about finding missing Component attributes? How do I identify and count them?s21825
-
Jan 21, 2005, 10:15 #13
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
assuming that these component attributes are columns in the component table, you will have to check each one individually
-
Jan 21, 2005, 10:20 #14
- Join Date
- Oct 2003
- Location
- Canada
- Posts
- 162
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by r937
Thanks again for your help with this!s21825
Bookmarks