Tough conceptual issue

HI,

I’m facing a tough conceptual issue regarding a parameterized search in the database. Parameterized search is where a user can from dropdowns or free text fields search on things like phone number, country, market…

Here is the SELECT query for this so far:


SELECT
    l.lead_id as 'Registrant Number'
    , l.salutation as 'Salutation'
    , l.first_name as 'First Name'
    , l.middle_names as 'Middle Names'
    , l.last_name as 'Last Name'
    , l.language as 'Preferred Lanaguage'
    , l.written_greeting as 'Written Greeting'
    , l.verbal_greeting as 'Verbal Greeting'
    , l.degree as 'Degree'
    , l.date_entered as 'Date Entered'
    , l.date_registered as 'Date Registered'
    , l.date_modified as 'Date Modified'
    , ls.lead_source as 'Lead Source'
    , u.uid as 'Counselor'
    , m.market_name as 'Market'
    , lc.location_name as 'Location'
    , sta.name as 'Status Name'
    , l.appointment_set as 'Appointment Set'
    , l.appointment_details as 'Appointment Details'
    , ss.sale_status as 'Sales Type'
    , l.follow_up_details as 'Follow Up Details'
    , g.gender as 'Gender'
    , y.year as 'Year of Birth'
    , ms.marital_status_name as 'Marital Status'
    , la.language as 'Language'
    , l.do_not_call as "Don't Call"
    , l.do_not_email as "Don't Email"
    , l.do_not_mail as "Don't Mail"
    , l.special_instructions as 'Special Instructions'
    , l.notes as 'Notes'
    , v.veteran_status as 'Veteran Status'
    , l.is_deceased as 'Is Deceased'
FROM
    leads as l
INNER JOIN
    lead_sources as ls
    ON
        l.lead_source_id = ls.lead_source_id
INNER JOIN
    users as u
    ON
        l.counselor_id = u.uid_number
INNER JOIN
    markets as m
    ON
        l.market_id = m.market_id
INNER JOIN
    locations as lc
    ON
        l.location_id = lc.location_id
INNER JOIN
    lk_genders as g
    ON
        l.gender_id = g.gender_id
LEFT OUTER JOIN
    years as y
    ON
        l.year_of_birth_id = y.year_id
INNER JOIN
    marital_status as ms
    ON
        l.married_status_id = ms.marital_status_id
INNER JOIN
    lk_languages as la
    ON
        l.language_id = la.language_id
LEFT OUTER JOIN
    lk_veterans as v
    ON
        l.veteran_status_id = v.veterans_id
LEFT OUTER JOIN
    leads2addresses as l2a
    ON
        l.lead_id = l2a.lead_id
LEFT OUTER JOIN
    addresses as a
    ON
        l2a.address_id = a.address_id
LEFT OUTER JOIN
    postal_codes as p
    ON
        a.post_code = p.postal_code
LEFT OUTER JOIN
    countries as cnt
    ON
        p.country_iso_code = cnt.country_iso_code
INNER JOIN
    sales_status as ss
    ON
        l.sales_type_id = ss.sale_status_id
LEFT OUTER JOIN
    statuses as sta
    ON
        l.status_id = sta.status_id
LEFT OUTER JOIN
    provs_states as pst
    ON
        p.state_prov_abbr = pst.abbreviation
    AND
        a.post_code = p.postal_code
LEFT OUTER JOIN
    leads2communications as l2c
    ON
        l.lead_id = l2c.lead_id
LEFT OUTER JOIN
    communications as com
    ON
        l2c.communication_id = com.communication_id
WHERE
    1 = 1

I can tag AND statments onto the end to search for specifics.

A lead can have many different types of phone numbers Work, Home, Cell … which is why the leads2communication table exists

delimiter $$

CREATE TABLE `leads2communications` (
  `lead_id` int(11) NOT NULL,
  `communication_id` int(11) NOT NULL,
  PRIMARY KEY  USING BTREE (`lead_id`,`communication_id`),
  CONSTRAINT `fk_leads2communications_leads` FOREIGN KEY (`lead_id`) REFERENCES `leads` (`lead_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$


.

The conceptual issue that I face is driven by this part of the above SELECT statement

LEFT OUTER JOIN
     leads2communications as l2c
     ON
         l.lead_id = l2c.lead_id
 LEFT OUTER JOIN
     communications as com
     ON
         l2c.communication_id = com.communication_id

.

If a user searches by phone number then I can use the following to return that unique record (bogus phone number):


AND
    com.phone_number = '905-000-0000'

However if I don’t append the above ‘AND’ filter then I get the same lead duplicated for every phone number linked to that particular lead, which is what is expected given the query.

Now if in the SELECT query I use the following

AND
    com.is_primary_phone_number = 1

then I will get the single record for that lead using no ‘AND’ filters after the ‘WHERE 1 = 1’. But this is not good as a user should be able to search on any of the leads phone numbers and by imposing this particular ‘AND’ filter that is not possible.

The communication table is currently

delimiter $$

CREATE TABLE `communications` (
  `communication_id` int(11) NOT NULL auto_increment,
  `communication_type_id` tinyint(4) NOT NULL default '0',
  `phone_number` varchar(50) default NULL,
  `internal_extension` smallint(6) default NULL,
  `secondary_dialing_instructions` text,
  `complete_foreign_dialing_sequence` text,
  `complete_local_dialing_sequence` text,
  `text_handle` varchar(250) default NULL,
  `email_address` varchar(250) default NULL,
  `url` varchar(250) default NULL,
  PRIMARY KEY  USING BTREE (`communication_id`),
  KEY `FK_index` (`communication_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8$$

Is there a way to use conditional logic so that in the WHERE 1=1 without any phone number specified in the search used without “AND com.phone_number = ‘905-000-0000’” a single lead is returned; however when a phone number is specified then return the lead with that specific phone number?

Hopefully this makes sense.

Steve

I’ve actually decided to do this through application logic instead of SQL

If someone searches by phone then I’ll run a first query that searches the leads2communication table and returns all the rows relative to the lead. I’ll then grab the lead_id and then run the SELECT query I showed in the first post. Sometimes things are just easier using application locic.

Thanks for perseveringly through this post :slight_smile:

Steve

[B]EDIT: this was a tough reply, and it appears that you posted while i was thinking :slight_smile:

however, i do urge you to consider my reply as you are in trouble with two many-to-many relationships[/B]

let’s start with this: from the description of your situation, you’re using your many-to-many leads-to-communications joins for two purposes – to return all the communications for a given lead (information retrieval) as well as for searching (filtering)

(you haven’t actually put any communications columns into the SELECT clause but i’m assuming this is because you are still developing the query)

there is definitely a tricky conceptual issue here, as i will try to show by focusing on the filtering

if the user enters a phone number to search for, you definitely want an INNER join from communications through to leads, so that only those leads which match that number are eventually returned

so if a lead is found this way, now you have to retrieve all the information about the lead, and since this includes any other communications, you now need the LEFT OUTER JOINs to get whatever other communications the lead has – the entire path would be com-INNER-l2c-INNER-lead-LEFT-l2c-LEFT-com

this X-Y-Z-Y-X join path is also used for things like “show all movies that have the same director as this movie” except in this case it’s “show all communications for the lead that has this communication”

which is clumsy, and i don’t even want to think about what might happen if you needed to incorporate another many-to-many relationship into the query, like, say, addresses… :smiley:

so the big challenge here is how to do the information retrieval display design – because as soon as you have more than one many-to-many (actually, more than one one-to-many) in the query, you’ll get cross join effects if you let the query return detail rows

luckily for us, mysql engineers came up with the brilliant GROUP_CONCAT function, which is by far the awesomest function of the last decade

to give you an example of how i would use it, i would replace this –


  FROM leads
...
LEFT OUTER 
  JOIN leads2communications as l2c
    ON l2c.lead_id = l.lead_id
LEFT OUTER 
  JOIN communications as com
    ON com.communication_id = l2c.communication_id

with this –


  FROM leads
...
LEFT OUTER 
  JOIN ( SELECT l2c.lead_id 
              , GROUP_CONCAT(
                    CONCAT_WS(':',com.[i]columns[/i])
                    SEPARATOR ';' 
                            ) AS cat_coms
           FROM leads2communications as l2c
				 INNER 
           JOIN communications as com
             ON com.communication_id = l2c.communication_id
         GROUP
             BY l2c.lead_id ) AS coms   
    ON coms.lead_id = l.lead_id

now each lead will join to at most one row containing concatenated com data

you will find some difficulty in this step as i notice that you have a “type” column in the communications table with nullable columns for all types, so be careful how you encode the concatenation

however, this encoding strategy (collapsing the many communications to a single aggregated concatenation per lead) has the added advantage that you can now also use it for searching

to use your example, the number 905-000-0000 which is being searched for will be included in the concatenated string somewhere, so you would append AND coms.cat_coms like ‘%905-000-0000%’ to your WHERE 1=1

except you wouldn’t use coms.cat_coms since coms is the right table in a left join, so you’de need COALESCE(comc.cat_coms,‘’)

i hope all this made sense

:slight_smile:

Wow Rudy,

You did quite a brilliant job explaining - given this is a messy idea. I actually would prefer to do this the SQL way, but have a nagging deadline that does not always allow for the learning curve gap :frowning: I am definitely learning by “Fire and Brimstone” :wink:

I am going to try to implement your suggestions and go through it carefully to better understand that replacement query and your application of the COALESCE.

I guess the COALESCE would need to be in my case


WHERE 1=1
AND 
         COALESCE(comc.cat_coms,'')
         LIKE 
                '%905-000-0000%'

You also spotted that I will have an issue with the Many to Many addresses in the same way so I am going to try to roll this idea out to that part of the query.

I’ll let you know how it goes!

{EDIT} Yes I guess I had to look at the actual columns as this doesn’t exit… I am going to try again with appropriate column names {/EDIT}

Wow that didn’t take long… get the following:

Error Code: 1054. Unknown column ‘com.columns’ in ‘field list’

We removed the

LEFT OUTER JOIN
    communications as com
    ON
        l2c.communication_id = com.communication_id

where the com alias is defined

I am going to keep working to see if I can figure out how to do this but at first blush it still looks to me that I am dealing with a many to many problem?

WOW SO GREAT! LOVE THE GROUP_CONCAT

This is the full query that works exactly like you said it would; when not filtering with AND I get all contacts no duplicates and when I use the filter:

AND 
         COALESCE(coms.cat_coms,'')
         LIKE 
                '%9050000000%'

it returns the records that are associated to this phone number.

Full Query:

SELECT 
    l.lead_id as 'Registrant Number' 
    , l.salutation as 'Salutation' 
    , l.first_name as 'First Name' 
    , l.middle_names as 'Middle Names' 
    , l.last_name as 'Last Name' 
    , l.language as 'Preferred Lanaguage' 
    , l.written_greeting as 'Written Greeting' 
    , l.verbal_greeting as 'Verbal Greeting' 
    , l.degree as 'Degree' 
    , l.date_entered as 'Date Entered' 
    , l.date_registered as 'Date Registered' 
    , l.date_modified as 'Date Modified' 
    , ls.lead_source as 'Lead Source' 
    , u.uid as 'Counselor' 
    , m.market_name as 'Market' 
    , lc.location_name as 'Location' 
    , sta.name as 'Status Name' 
    , l.appointment_set as 'Appointment Set' 
    , l.appointment_details as 'Appointment Details' 
    , ss.sale_status as 'Sales Type' 
    , l.follow_up_details as 'Follow Up Details' 
    , g.gender as 'Gender' 
    , y.year as 'Year of Birth' 
    , ms.marital_status_name as 'Marital Status' 
    , la.language as 'Language' 
    , l.do_not_call as "Don't Call" 
    , l.do_not_email as "Don't Email" 
    , l.do_not_mail as "Don't Mail" 
    , l.special_instructions as 'Special Instructions' 
    , l.notes as 'Notes' 
    , v.veteran_status as 'Veteran Status' 
    , l.is_deceased as 'Is Deceased' 
FROM 
    leads as l 
     
INNER JOIN 
    lead_sources as ls 
    ON 
        l.lead_source_id = ls.lead_source_id 
INNER JOIN 
    users as u 
    ON 
        l.counselor_id = u.uid_number 
INNER JOIN 
    markets as m 
    ON 
        l.market_id = m.market_id 
INNER JOIN 
    locations as lc 
    ON 
        l.location_id = lc.location_id 
INNER JOIN 
    lk_genders as g 
    ON 
        l.gender_id = g.gender_id 
LEFT OUTER JOIN 
    years as y 
    ON 
        l.year_of_birth_id = y.year_id 
INNER JOIN 
    marital_status as ms 
    ON 
        l.married_status_id = ms.marital_status_id 
INNER JOIN 
    lk_languages as la 
    ON 
        l.language_id = la.language_id 
LEFT OUTER JOIN 
    lk_veterans as v 
    ON 
        l.veteran_status_id = v.veterans_id 
INNER JOIN 
    sales_status as ss 
    ON 
        l.sales_type_id = ss.sale_status_id 
LEFT OUTER JOIN 
    statuses as sta 
    ON 
        l.status_id = sta.status_id 
LEFT OUTER JOIN 
    leads2addresses as l2a 
    ON 
        l.lead_id = l2a.lead_id 
LEFT OUTER JOIN 
    addresses as a 
    ON 
        l2a.address_id = a.address_id     
LEFT OUTER JOIN 
    postal_codes as p 
    ON 
        a.post_code = p.postal_code 
LEFT OUTER JOIN 
    countries as cnt 
    ON 
        p.country_iso_code = cnt.country_iso_code 
LEFT OUTER JOIN 
    provs_states as pst 
    ON 
        p.state_prov_abbr = pst.abbreviation 
    AND 
        a.post_code = p.postal_code 
LEFT OUTER  
  JOIN ( SELECT l2c.lead_id  
              , GROUP_CONCAT( 
                    CONCAT_WS(':',com.phone_number) 
                    SEPARATOR ';'  
                            ) AS cat_coms 
           FROM leads2communications as l2c 
                 INNER  
           JOIN communications as com 
             ON com.communication_id = l2c.communication_id 
         GROUP 
             BY l2c.lead_id ) AS coms    
    ON coms.lead_id = l.lead_id 
 
WHERE 
    1 = 1 
AND  
         COALESCE(coms.cat_coms,'') 
         LIKE  
                '%9050000000%'

I don’t know what to say other than ‘THANK YOU’ as you saved me lots of time! You must have seen this one before (given your example of the movies) but for me it is unlikely that until a few years from now I would have been able to figure this out on my own.

Many Regards!

Steve

wait! wait!

searching via the concatenated string will not scale… it’s a table scan

sigh

oh… um… your GROUP_CONCAT has only one column in it, whereas you would want all the columns, which is why the nested CONCAT_WS for non-null columns

each row would be a type and then one non-null column (the nested CONCAT_WS) and these row values would be aggregated with ‘:’ as the separator (the outer GROUP_CONCAT)

it’s actually quite elegant but it still won’t scale :slight_smile:

…your GROUP_CONCAT has only one column in it, whereas you would want all the columns, which is why the nested CONCAT_WS for non-null columns

each row would be a type and then one non-null column (the nested CONCAT_WS) and these row values would be aggregated with ‘:’ as the separator (the outer GROUP_CONCAT)
I see.

I even thought about this from the application point of view and it is difficult too as a user can choose to search by Country and Phone Number. This means I would have to use a very big set of criteria to say if this and this then this…YUCK!

it’s actually quite elegant but it still won’t scale
This sucks - I did notice that adding this to the query slowed it down a bunch with just a small set of sample data :frowning: I do have complete control over the database design (other than having to have the customers desired fields) Can a db redesign make it easier?

Steve

deleted this post as a glitch caused a duplicate post

Does this do a full table scan and could it be used as a sub-query:


SELECT 
    distinct(l.lead_id) 
FROM 
    leads as l 
LEFT OUTER JOIN 
    leads2communications as l2c 
    ON 
        l.lead_id = l2c.lead_id 
LEFT OUTER JOIN 
    communications as com 
    ON 
        l2c.communication_id = com.communication_id 
WHERE 
    1 = 1 
AND  
    com.phone_number 
         LIKE  
                '%9050000000%'

From this code I can get the unique id associate with this phone number. If I remove the AND then I get all records with their distinct lead_id’s.

I think that this is inefficient as it appears to be quite slow. However the idea is to get the unique lead_ids when doing the search then do a separate query to get all the phone numbers for this lead and then join the data together.

Does this make sense?

Steve

perhaps :slight_smile:

a query redesign would be better, though, because, hey, you don’t have to bring in the demolition crew for the old database

which is good in a lot of places

:smiley:

you are on the right track with a customized query to retrieve the lead_ids of interest, then retrieving those leads in all there information retrieval glory, this latter query of which can be optimized well

and it turns out that breaking the task into two separate queries is a very good approach

which side do you want to handle first?

whoa, you’re going too fast for the old guy

i need a beer :cool:

probably when i get back you’ll have completed the whole thing, but if not, i do promise to get back, unless, you know, i fall in the ravine…

I look forward to your stumbling return… and a night in the ravine may teach you a lesson (If not already learned) :wink:

Working too much… frustrated… going to bed… back at it a 5:00 am tomorrow, if anything good I will post.

Thanks for all your help!
Steve

On the many to many tables and forcing a distinct(l.lead_id) do cross joins still occur as it takes only the first instance or does it query the cross-joins first and then report distinct?

i’m not sure i understand this question

Hi Rudy,

Hope your had a good time last evening!

When I run this (very similar query from yesterday):


SELECT 
    distinct(l.lead_id) as 'Registrant Number' 
    , l.salutation as 'Salutation' 
    , l.first_name as 'First Name' 
    , l.middle_names as 'Middle Names' 
    , l.last_name as 'Last Name' 
    , l.language as 'Preferred Lanaguage' 
    , l.written_greeting as 'Written Greeting' 
    , l.verbal_greeting as 'Verbal Greeting' 
    , l.degree as 'Degree' 
    , l.date_entered as 'Date Entered' 
    , l.date_registered as 'Date Registered' 
    , l.date_modified as 'Date Modified' 
    , ls.lead_source as 'Lead Source' 
    , u.uid as 'Counselor' 
    , m.market_name as 'Market' 
    , lc.location_name as 'Location' 
    , sta.name as 'Status Name' 
    , l.appointment_set as 'Appointment Set' 
    , l.appointment_details as 'Appointment Details' 
    , ss.sale_status as 'Sales Type' 
    , l.follow_up_details as 'Follow Up Details' 
    , g.gender as 'Gender' 
    , y.year as 'Year of Birth' 
    , ms.marital_status_name as 'Marital Status' 
    , la.language as 'Language' 
    , l.do_not_call as "Don't Call" 
    , l.do_not_email as "Don't Email" 
    , l.do_not_mail as "Don't Mail" 
    , l.special_instructions as 'Special Instructions' 
    , l.notes as 'Notes' 
    , v.veteran_status as 'Veteran Status' 
    , l.is_deceased as 'Is Deceased' 
FROM 
    leads as l 
     
INNER JOIN 
    lead_sources as ls 
    ON 
        l.lead_source_id = ls.lead_source_id 
INNER JOIN 
    users as u 
    ON 
        l.counselor_id = u.uid_number 
INNER JOIN 
    markets as m 
    ON 
        l.market_id = m.market_id 
INNER JOIN 
    locations as lc 
    ON 
        l.location_id = lc.location_id 
INNER JOIN 
    lk_genders as g 
    ON 
        l.gender_id = g.gender_id 
LEFT OUTER JOIN 
    years as y 
    ON 
        l.year_of_birth_id = y.year_id 
INNER JOIN 
    marital_status as ms 
    ON 
        l.married_status_id = ms.marital_status_id 
INNER JOIN 
    lk_languages as la 
    ON 
        l.language_id = la.language_id 
LEFT OUTER JOIN 
    lk_veterans as v 
    ON 
        l.veteran_status_id = v.veterans_id 
INNER JOIN 
    sales_status as ss 
    ON 
        l.sales_type_id = ss.sale_status_id 
LEFT OUTER JOIN 
    statuses as sta 
    ON 
        l.status_id = sta.status_id 
LEFT OUTER JOIN 
    leads2addresses as l2a 
    ON 
        l.lead_id = l2a.lead_id 
LEFT OUTER JOIN 
    addresses as a 
    ON 
        l2a.address_id = a.address_id 
LEFT OUTER JOIN 
    postal_codes as p 
    ON 
        a.post_code = p.postal_code 
LEFT OUTER JOIN 
    countries as cnt 
    ON 
        p.country_iso_code = cnt.country_iso_code 
LEFT OUTER JOIN 
    provs_states as pst 
    ON 
        p.state_prov_abbr = pst.abbreviation 
    AND 
        a.post_code = p.postal_code 
LEFT OUTER JOIN  
    leads2communications as l2c  
    ON  
        l.lead_id = l2c.lead_id  
LEFT OUTER JOIN  
    communications as com  
    ON  
        l2c.communication_id = com.communication_id  
WHERE  
    1 = 1  
AND   
    com.phone_number  
         LIKE   
                '%905%'

It returns only one instance of each lead. Now this might sound like it whammys the should be output of this query; although it potentially lets me take the results from this query, get the lead_id and then run a simpler query for the many to many tables and then join these two results programmatically.

You mentioned yesterday:

so the big challenge here is how to do the information retrieval display design – because as soon as you have more than one many-to-many (actually, more than one one-to-many) in the query, you’ll get cross join effects if you let the query return detail rows

So this query does have the address and phones many to many tables that would cause cross joins if run without the distinct(l.lead_id) so when including the distinct(l.lead_id), and as only one record is returned per lead, does the internal query first query the results using the cross join and then filter for the distinct or does the distinct(l.lead_id) stop the cross join as it causes the SELECT to enforce the distinct? (Sorry hard to describe :()

Thanks,
Steve

hold it right there, amigo :slight_smile:

DISTINCT is ~not~ a function

putting the first column that comes after DISTINCT into parentheses does nothing to change the fundamental purpose of DISTINCT, to ensure that the query returns unique rows – i.e. no two result rows are identical in every column value

if your query returned only one row per lead, then that would be because there was only one row per lead in the joined data that you retrieved – each lead had only one address, only one communication, etc.

you need to drop this approach and come at the problem from another direction

the other direction involves GROUP BY but i assure you that the outer query is not the place to be applying it (see post #3 for an example of applying GROUP BY in a subquery in the FROM clause)

besides, you need to concentrate on separating the search query from the data retrieval query

I trust you but when I remove the distinct(lead_id) I get three results and when I don’t have it I get all the results?

I do however understand what you saying

besides, you need to concentrate on separating the search query from the data retrieval query
so I will focus on doing that instead.

Thank you.