SitePoint Sponsor |
|
User Tag List
Results 1 to 17 of 17
-
May 1, 2008, 04:11 #1
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Running large insert queries in my web app
Hi there
Here is a simplified explanation of something I want to do in my app:
A user enters a sentence into a webform.
I split the string into separate words.
Create a query to see if any of the words match words in my members_keywords table (can be multiple instances of any one word).
Query returns an array of member id's.
I create a new insert query with one row for each member id returned in the previous select.
The thing is, this one process could potentially return thousands or member id's and so the insert query would equally need to insert thousands of rows.
I'm really concerned about the impact of this on the server as this process could potentially be running many times a day along with everything else in the web app.
Am I right to be concerned? Is there a way to make it less intensive? Do it in batches? Timeouts would be a nightmare. As it is, the query would be ok, but I need to make things future proof.
Any thoughts much appreciated.
Many thanks
-
May 1, 2008, 04:27 #2
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
why do you need to return the IDs? just insert them from the original query
Code:INSERT INTO sometable ( list of columns ) SELECT list of columns FROM complexquery
-
May 1, 2008, 06:36 #3
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hi there,
Ah yes, that helps, but still involves inserting possibly thousands or rows. Is this potentially an issue or am I being paranoid? Should the whole thing be executed pretty quick?
Also, there may be more than one matching keyword for each member which would return more than one instance of each member_id. Should I use DISTINCT? But again, this would slow things down.
Many thanks
-
May 1, 2008, 06:51 #4
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
rudy's suggestion is very fast. it should be able to handle thousands of rows with no problem.
DISTINCT might be appropriate, but without more detail it's impossible to say. if DISTINCT is needed, you should probably also have a UNIQUE index or possible a compound PRIMARY key.Check out our new Industry News forum!
Keep up-to-date with the latest SP news in the Community Crier
I edit the SitePoint Podcast
-
May 1, 2008, 10:07 #5
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hello
Thanks for the replies.
Ok, here is a bit more info as I'm not sure the insert and select in one query will work....
1. A user enters a 'sentence' into a form which will be up to 64 characters long.
2. The given 'sentence' is split into single words to give a list of keywords.
3. I need to query the database to get a list of member id's of members that have a keyword(s) matching any of the keywords given from the 'sentence'.
4. Each member will have a max of 10 keywords stored against their business.
5. Each members buiness keyword is one row in a keyword table
6. The keyword table consists of a composite key made up of members business id and keyword
7. There maybe more than 1 keyword match for each user so a member could be selected more than once (but needs to only be selected once)
8. For each member id returned I need to insert a row into another table. Each row will consist of the members id that entered the 'sentence', the member_id that has a keyword match, a formatted version of the 'sentence' and the current datetime.
So, I can't just insert whatever has been returned from the select as other data also needs to be entered.
What would you suggest?
Any help much appreciated as always
-
May 1, 2008, 12:52 #6
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
-
May 1, 2008, 13:11 #7
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
your description is still a bit sketchy, especially since you use the term "member" to refer to multiple, different participants and you didn't provide table or column names. so what i have here is just a loose example.
Code:insert ignore result_table ( searcher_member_id , result_member_id , formatted_sentence , ts) select distinct $member_id_who_did_search , member_id , $formatted_sentence , now() from keywords where keyword in ($keywords)
Check out our new Industry News forum!
Keep up-to-date with the latest SP news in the Community Crier
I edit the SitePoint Podcast
-
May 1, 2008, 14:52 #8
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hello again
Thanks for the replies. Sorry for being vague. Hopefully this will clear it up.
Here is a summary of the tables involved. I've trimmed out all columns which are irrelevant for this task:
MEMBERS
member_id (PK)
MEMBERS_BUSINESS
business_id (PK)
member_id (FK)
MEMBERS_BUSINESS_KEYWORD
business_id (PK)
keyword (PK)
MEMBERS_NOTICES
notice_id (PK auto increment)
member_id (FK member id who entered the sentence)
receiver_id (FK member id of notice recipient)
notice (the formatted sentence entered by the user)
date_added (date)
No additional indexes have been added to any of these tables yet
- A user enters a sentence into a form field.
- I split the sentence into individual words
- I now have the id of the user that entered the sentence, the full sentence and a list of keywords taken from the sentence
So, I need a single efficient query that will give me all the id's of members that have at least one business keywords that matches at least one keyword from the list of keywords I took from the sentence. Clearly the returned members id needs to come from the MEMBERS_BUSINESS table as only the members business id is stored with each keyword.
I need to make sure that only one entry is added to MEMBERS_NOTICES for each matching member.
Many thanks for your time
-
May 1, 2008, 15:23 #9
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
first, do this to make the query efficient:
Code:ALTER TABLE members_business_keyword ADD INDEX reversi (keyword,business_id)
Code:SELECT DISTINCT mb.member_id FROM members_business_keyword AS mbk INNER JOIN members_business AS mb ON mb.business_id = mbk.business_id WHERE mbk.keyword IN ( 'keyword1' , 'keyword2' , 'keywordn' )
-
May 2, 2008, 05:07 #10
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hi Rudy,
Yep, that all looks goodI've added the reverse index and the select looks like it will work just fine. However, why have you used an INNER JOIN? Is tha better practice than to add members_business to the FROM and put the condition in the WHERE?
Many thanks
-
May 2, 2008, 05:10 #11
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
yes, using explicit JOIN syntax is much better
-
May 2, 2008, 05:22 #12
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Good good. How can that be adapted to incorporate the INSERT?
Thanks again!
-
May 2, 2008, 05:44 #13
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
Code:INSERT INTO member_notices ( member_id , receiver_id , notice , date_added ) SELECT DISTINCT , $member_id , mb.member_id , '$notice' , CURRENT_DATE FROM members_business_keyword AS mbk INNER JOIN members_business AS mb ON mb.business_id = mbk.business_id WHERE mbk.keyword IN ( 'keyword1' , 'keyword2' , 'keywordn' )
-
May 2, 2008, 06:50 #14
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Fantastic. Works like a charm! And with only a 3 lines to fully validate and prepare the given sentence it has to be my favourite function ever
Just one more thing though... I've just remembered that the member can set a preference as to whether they receive these notices. So 'notice' in the members_prefs table needs to be set to 1. Therefore, should the query be adapted to this ? :
Code:INSERT INTO member_notices ( member_id , receiver_id , notice , date_added ) SELECT DISTINCT , $member_id , mb.member_id , '$notice' , CURRENT_DATE FROM members_business_keyword AS mbk INNER JOIN members_business AS mb ON mb.business_id = mbk.business_id INNER JOIN members_prefs AS mp ON mp.member_id = mb.member_id AND mp.notice = 1 WHERE mbk.keyword IN ( 'keyword1' , 'keyword2' , 'keywordn' )
Many thanks
-
May 2, 2008, 07:28 #15
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
yes, that's correct
PKs have an index by default, but you should make sure all FKs have an index too (except for the members_business_keyword table, which we've already covered)
-
May 2, 2008, 10:22 #16
- Join Date
- Jun 2006
- Posts
- 145
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Me again
If I change the first INNER JOIN to this:
Code:INNER JOIN members_business AS mb ON mb.business_id = mbk.business_id AND mb.member_id NOT IN ( $contactids )
I would expect $contactids to contain between 1 and 100 different numbers. Typically, it would probably only be a list of less than 50 numbers.
Thanks again
-
May 2, 2008, 10:41 #17
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
Bookmarks