Hello my friends. Im glad you are reading this)
Im not an expert in SQL OR MySQL (which I'm using for this task) so Im interested and excited and hope to get answers. I have already solved current problem, but I would like to hear better way to do this.
Im not an owner of the original db scheme and architecture.
There is a list of submissions. A submission is a user submitted form that includes different fields (components) - textfields, select boxes, date fields.
Each different field(component) stores its data in the same table submissions_data. Before saving data to table script breaks field submitted value into parts and saves it as rows in the same table using incremented index (see no).
sid - INTEGER submission id
cid - INTEGERcomponent ID
type - STRING type of component select or textfield or else
parameters - TEXT serialized array of filed configuration data like field default values, display options, radio options for select, else
sid - INTEGER submission id to which data belongs
cid - INTEGET component id to which submited data belongs
no - INTEGER number of the data
data - TEXT data itself
Submitted form containing 1 textfield, and 1 select box will result similar table data. Sorry if any mistaked I could make.
cid type parameters
1 'textfield' default = 'Nothing here'
2 'select' options = 'Option1', 'Option2', 'option3', 'option4', 'Option5'
sid cid no data
1 1 0 'Hello World textfield value'
1 2 0 'Option1'
1 2 1 'Option4'
1 2 3 'Option5'
2 1 0 'From Adam'
2 2 0 'Option1'
2 2 1 'Option6'
Exerything is clear for textfield. It uses 1 row. And for select box as you can see script saves selected options as separate rows using incremented 'no' column.
I cant change the schema of database. Administrator can go in and look and search all submissions as table.
There will be a filter for each of the fields.
I have to write a select query which will filter submissions and will select only those that match all the filter conditions. More user can make a sort on one of the columns after search.
So, If i have 2 components of type select box I will have 2 filters which can be applied the same search. User can select multiple options in filter and can select multiple options in select box when filling out submission
The problem to write a such query is that the selected options is stored as separate row and additionally I have to order resulting set of rows by number of total options submitted by user.
SO I wrote query, for 2 select boxes:
SELECT * FROM table_submissions AS s
LEFT JOIN table_submissions_data AS sd1 ON sd1.sid=s.sid AND sd1.cid = 1
LEFT JOIN table_submissions_data AS sd2 ON sd2.sid=s.sid AND sd2.cid = 2
WHERE sd1.data != NULL AND sd2.data != NULL
GROUP BY sd1.cid, sd2.cid, s.sid
HAVING COUNT(DISTINCT IF(sd1.data IN ('search opt1', 'search opt2'), sd1.no, 0)) = 2 AND COUNT(DISTINCT IF(sd2.data IN ('search opt1', 'search opt2', 'search opt3'), sd2.no, 0)) = 3
ORDER BY COUNT(DISTINCT sd1.no), sd1.data
That will select all user submitted optinos for each submission, and will only select submissions in which component selected options match the filter options.
Expression COUNT(DISTINCT IF(sd1.data IN ('search opt1', 'search opt2'), sd1.no, 0)) = 2 selects number of unique selected option keys, and compare the number of them to number of filter options. So if filter has more number means not all the options were chosen in submission. And also because I know the total number of selected options I can make order on them. So even if 2 submissions will have select box matching filter options, the submission in which user selected more options will go top.
What I would like to understand is what will be the better way of doing this task? I thought about other several cases:
using subquery ... s.sid IN SELECT ... to find submissions matching filtered options and then in main query make order by total number of selected options.
make a separate join for each of the component option like SELECT * FROM table_submissions AS s
LEFT JOIN table_submissions_data AS sd1_1 ON sd1_1.sid=s.sid AND sd1_1.cid = 1 AND sd1_1.data = 'option1'
LEFT JOIN table_submissions_data AS sd2_1 ON sd2_1.sid=s.sid AND sd2_1.cid = 1 AND sd2_1.data = 'option2'
that will allow to replace having by -> where and make query writing easier and cleaner but what about performance?
What best practice here.
I would like to hear from you what do you thing regarding your experience. Performance for me is not critical.
kind regards, Ladislav