SitePoint Sponsor

User Tag List

Results 1 to 17 of 17
  1. #1
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Running large insert queries in my web app

    Hi there

    Here is a simplified explanation of something I want to do in my app:

    A user enters a sentence into a webform.
    I split the string into separate words.
    Create a query to see if any of the words match words in my members_keywords table (can be multiple instances of any one word).
    Query returns an array of member id's.
    I create a new insert query with one row for each member id returned in the previous select.

    The thing is, this one process could potentially return thousands or member id's and so the insert query would equally need to insert thousands of rows.

    I'm really concerned about the impact of this on the server as this process could potentially be running many times a day along with everything else in the web app.

    Am I right to be concerned? Is there a way to make it less intensive? Do it in batches? Timeouts would be a nightmare. As it is, the query would be ok, but I need to make things future proof.

    Any thoughts much appreciated.

    Many thanks

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    why do you need to return the IDs? just insert them from the original query
    Code:
    INSERT INTO sometable ( list of columns ) SELECT list of columns FROM complexquery
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi there,

    Ah yes, that helps, but still involves inserting possibly thousands or rows. Is this potentially an issue or am I being paranoid? Should the whole thing be executed pretty quick?

    Also, there may be more than one matching keyword for each member which would return more than one instance of each member_id. Should I use DISTINCT? But again, this would slow things down.

    Many thanks

  4. #4
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    rudy's suggestion is very fast. it should be able to handle thousands of rows with no problem.

    DISTINCT might be appropriate, but without more detail it's impossible to say. if DISTINCT is needed, you should probably also have a UNIQUE index or possible a compound PRIMARY key.
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast

  5. #5
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hello

    Thanks for the replies.

    Ok, here is a bit more info as I'm not sure the insert and select in one query will work....

    1. A user enters a 'sentence' into a form which will be up to 64 characters long.
    2. The given 'sentence' is split into single words to give a list of keywords.
    3. I need to query the database to get a list of member id's of members that have a keyword(s) matching any of the keywords given from the 'sentence'.
    4. Each member will have a max of 10 keywords stored against their business.
    5. Each members buiness keyword is one row in a keyword table
    6. The keyword table consists of a composite key made up of members business id and keyword
    7. There maybe more than 1 keyword match for each user so a member could be selected more than once (but needs to only be selected once)
    8. For each member id returned I need to insert a row into another table. Each row will consist of the members id that entered the 'sentence', the member_id that has a keyword match, a formatted version of the 'sentence' and the current datetime.

    So, I can't just insert whatever has been returned from the select as other data also needs to be entered.

    What would you suggest?

    Any help much appreciated as always

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by sxtrail View Post
    So, I can't just insert whatever has been returned from the select as other data also needs to be entered.
    other than what you already mentioned?

    - the members id that entered the 'sentence'
    - the member_id that has a keyword match
    - a formatted version of the 'sentence'
    - the current datetime.

    what else?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    your description is still a bit sketchy, especially since you use the term "member" to refer to multiple, different participants and you didn't provide table or column names. so what i have here is just a loose example.
    Code:
    insert ignore result_table
         ( searcher_member_id
         , result_member_id
         , formatted_sentence
         , ts)
    select distinct
           $member_id_who_did_search
         , member_id
         , $formatted_sentence
         , now()
      from keywords
     where keyword in ($keywords)
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast

  8. #8
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hello again

    Thanks for the replies. Sorry for being vague. Hopefully this will clear it up.

    Here is a summary of the tables involved. I've trimmed out all columns which are irrelevant for this task:

    MEMBERS
    member_id (PK)

    MEMBERS_BUSINESS
    business_id (PK)
    member_id (FK)

    MEMBERS_BUSINESS_KEYWORD
    business_id (PK)
    keyword (PK)

    MEMBERS_NOTICES
    notice_id (PK auto increment)
    member_id (FK member id who entered the sentence)
    receiver_id (FK member id of notice recipient)
    notice (the formatted sentence entered by the user)
    date_added (date)

    No additional indexes have been added to any of these tables yet

    - A user enters a sentence into a form field.
    - I split the sentence into individual words
    - I now have the id of the user that entered the sentence, the full sentence and a list of keywords taken from the sentence

    So, I need a single efficient query that will give me all the id's of members that have at least one business keywords that matches at least one keyword from the list of keywords I took from the sentence. Clearly the returned members id needs to come from the MEMBERS_BUSINESS table as only the members business id is stored with each keyword.

    I need to make sure that only one entry is added to MEMBERS_NOTICES for each matching member.

    Many thanks for your time

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    first, do this to make the query efficient:
    Code:
    ALTER TABLE members_business_keyword
    ADD INDEX reversi (keyword,business_id)
    then try this query:
    Code:
    SELECT DISTINCT
           mb.member_id
      FROM members_business_keyword AS mbk
    INNER
      JOIN members_business AS mb
        ON mb.business_id = mbk.business_id
     WHERE mbk.keyword IN ( 'keyword1'
                          , 'keyword2'
                          , 'keywordn' )
    if that look right, i'll show you how to write the INSERT SELECT
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Rudy,

    Yep, that all looks good I've added the reverse index and the select looks like it will work just fine. However, why have you used an INNER JOIN? Is tha better practice than to add members_business to the FROM and put the condition in the WHERE?

    Many thanks

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    yes, using explicit JOIN syntax is much better
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good good. How can that be adapted to incorporate the INSERT?

    Thanks again!

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Code:
    INSERT
      INTO member_notices
         ( member_id 
         , receiver_id 
         , notice 
         , date_added )
    SELECT DISTINCT
         , $member_id
         , mb.member_id
         , '$notice'
         , CURRENT_DATE
      FROM members_business_keyword AS mbk
    INNER
      JOIN members_business AS mb
        ON mb.business_id = mbk.business_id
     WHERE mbk.keyword IN ( 'keyword1'
                          , 'keyword2'
                          , 'keywordn' )
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Fantastic. Works like a charm! And with only a 3 lines to fully validate and prepare the given sentence it has to be my favourite function ever

    Just one more thing though ... I've just remembered that the member can set a preference as to whether they receive these notices. So 'notice' in the members_prefs table needs to be set to 1. Therefore, should the query be adapted to this ? :
    Code:
    INSERT
      INTO member_notices
         ( member_id 
         , receiver_id 
         , notice 
         , date_added )
    SELECT DISTINCT
         , $member_id
         , mb.member_id
         , '$notice'
         , CURRENT_DATE
      FROM members_business_keyword AS mbk
    INNER
      JOIN members_business AS mb
        ON mb.business_id = mbk.business_id
    INNER
      JOIN members_prefs AS mp
        ON mp.member_id = mb.member_id
           AND mp.notice = 1
     WHERE mbk.keyword IN ( 'keyword1'
                          , 'keyword2'
                          , 'keywordn' )
    Member_id in member_prefs in the primary key. Is this ok? Can you tell if there are any other indexes I am likely to need?

    Many thanks

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    yes, that's correct

    PKs have an index by default, but you should make sure all FKs have an index too (except for the members_business_keyword table, which we've already covered)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Zealot
    Join Date
    Jun 2006
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Me again

    If I change the first INNER JOIN to this:
    Code:
    INNER
       JOIN members_business AS mb
       ON mb.business_id = mbk.business_id
       AND mb.member_id NOT IN ( $contactids )
    .... would that affect the performance of the query much?
    I would expect $contactids to contain between 1 and 100 different numbers. Typically, it would probably only be a list of less than 50 numbers.

    Thanks again

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by sxtrail View Post
    .... would that affect the performance of the query much?
    dunno

    what happened when you tried it?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •