SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Wizard bronze trophy
    Join Date
    Oct 2001
    Location
    Vancouver BC Canada
    Posts
    2,030
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)

    Looking to refine/optimize a complex multijoin query

    Hello all,

    This is a query I've been thinking about for quite some time that I would like to optimize to run efficiently in MySQL. It began life in an ASP/Access app about 12 years ago that ran a straight (no joins) query against an Access database of contacts returning about 600 rows. That was saved in an array which was then looped and each row ran another query against a couple of tables that pulled out yearly grade information. It's not the most efficient way of doing it but it worked.

    Now we're using a more sophisticated CRM on PHP/MySQL called CiviCRM and I'm rebuilding this part of the system as a custom plugin.

    In order to achieve the same results as the old system, I've duplicated what we had originally done to get a baseline to start with. It's dead slow compared to the old ASP/Access version so I have some work to do.

    Here's how I'm approaching it:

    I have a main query to pull in the contact information as follows:

    PHP Code:

    $sql 
    "SELECT contact.id
            , contact.last_name
            , contact.first_name
            , contact.external_identifier
            , contact.user_unique_id
            , membership.membership_type_id
            , membership.id AS membership_id
            , membership.join_date AS member_since
            , membership_type.name AS member_type

            FROM contact 
            
            INNER JOIN membership
                    ON contact.id = membership.contact_id
            
            INNER JOIN membership_type
                    ON membership.membership_type_id = membership_type.id
            
            WHERE contact.first_name IS NOT NULL 
            
            AND contact.last_name IS NOT NULL
            
            ORDER BY contact.last_name"

    The CRM has a database abstraction layer that allows me to get a data access object and I can either loop in into an array or loop and execute a query for each row in my main query. I've found it is slightly more efficient looping directly from the dataset so here's how that part works (I've skipped a bunch of the html cruft involved to keep the concept from getting muddled up):

    Anything in the code below that is prefixed with $dao-> is a value from the results of the query above and specific to the row that is in focus during that iteration of the While loop.

    PHP Code:

    // Execute the query above and return a dataset object 
    $dao CRM_Core_DAO::executeQuery($sql);

    // START HTML string variable to hold a table of results $report_table = "<table>, etc..";


    while( $dao->fetch( ) ) {   

            
    $sql "SELECT professional_development_categories.id AS id
                     , professional_development_categories.category AS category
                     , SUM(professional_development_activities.credits) AS credits
                     , professional_development_categories.minimum
                     , professional_development_categories.maximum
                     , professional_development_categories.description
                     FROM professional_development_categories
                     LEFT OUTER JOIN professional_development_activities
                     ON professional_development_activities.category_id = professional_development_categories.id
                     AND professional_development_activities.contact_id = " 
    $dao->id "
                     AND EXTRACT(YEAR FROM professional_development_activities.credit_date) = " 
    $_SESSION["report_year"] . "
                     GROUP BY professional_development_categories.id"
    ;

            
    // Populate the variable that holds the HTML table of results
            
    $report_table .= '<tr>';
            
    $report_table .= '<td>' $dao->last_name '</td>';
            
    $report_table .= '<td>' $dao->first_name '</td>';
            
    $report_table .= '<td>' $dao->member_type '</td>';
            
    $report_table .= '<td>' $dao->member_since '</td>';

            
    // Execute the sub query for each row in the main query to get this years scores for that individual
            
    $subdao CRM_Core_DAO::executeQuery($sql);
                
            
    $total_credits 0;
            
    $sub_cells "";
                    
            while( 
    $subdao->fetch( ) ) { 
                     
    $total_credits += abs($subdao->credits);             
                     
    $sub_cells .= '<td>' abs($subdao->credits) . '</td>';
            }
                
            
    $report_table .= '<td>' $total_credits '</td>';
            
    $report_table .= $sub_cells;

            
    $report_table .= '</tr>';

    }


    // END HTML to create a table of results $report_table .= "</table>, etc.."; 
    So hopefully that isn't too convoluted and makes some sense. I'm looking to make some improvements to it and speed it up. I'd like to combine the queries but I'm not sure how I can do that.

    Thanks,
    Andrew
    Andrew Wasson | www.lunadesign.org
    Principal / Internet Development

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Oct 2001
    Location
    Vancouver BC Canada
    Posts
    2,030
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Ok, so to try and improve this query, I've been playing in PHPMyAdmin running some queries. I figured the first thing I could try is to add the sum of all of the credits from the activities table. So I added that field and LEFT JOIN-ed the table as follows:

    PHP Code:

    SELECT civicrm_contact
    .id
            
    civicrm_contact.last_name
            
    civicrm_contact.first_name
            
    civicrm_contact.external_identifier
            
    civicrm_contact.user_unique_id
            
    civicrm_membership.membership_type_id
            
    civicrm_membership.id AS membership_id
            
    civicrm_membership.join_date AS member_since
            
    civicrm_membership_type.name AS member_type

            
    # ADD Credits
            
    civi_cpd_activities.credits AS total_credits

            FROM civicrm_contact 

            
    # Left Join the table with some criteria
            
    LEFT JOIN civi_cpd_activities
            ON  civi_cpd_activities
    .contact_id civicrm_contact.id AND EXTRACT(YEAR FROM civi_cpd_activities.credit_date) = " . $_SESSION["report_year"] . "

            
    INNER JOIN civicrm_membership
            ON civicrm_contact
    .id civicrm_membership.contact_id
            INNER JOIN civicrm_membership_type
            ON civicrm_membership
    .membership_type_id civicrm_membership_type.id
            WHERE civicrm_contact
    .first_name IS NOT NULL 
            
    AND civicrm_contact.last_name IS NOT NULL
            ORDER BY civicrm_contact
    .last_name 
    Now the problem with this is that it gives me a new row for each civi_cpd_activities.credits and duplicates all the other info about the individual. I would like to SUM(civi_cpd_activities.credits) but for some reason that just gives me a single result. I don't quite understand why it's doing that so I need to figure that out.

    It's still a very slow query to run. PHPMyAdmin responded with the following results: Showing rows 0 - 29 ( 684 total, Query took 8.0574 sec). I've written some pretty ugly queries but this one is really slow and I haven't even gotten to the part where I have to pull out the sum of each score category and list that out.
    Andrew Wasson | www.lunadesign.org
    Principal / Internet Development

  3. #3
    SitePoint Wizard bronze trophy
    Join Date
    Oct 2001
    Location
    Vancouver BC Canada
    Posts
    2,030
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Just for fun, I thought I'd try a sub-query for that summed field and it seems to be 1 second faster than the field with the left join. Still pretty slow for such a seemingly simple query:

    Code:
    SELECT civicrm_contact.id
            , civicrm_contact.last_name
            , civicrm_contact.first_name
            , civicrm_contact.external_identifier
            , civicrm_contact.user_unique_id
            , civicrm_membership.membership_type_id
            , civicrm_membership.id AS membership_id
            , civicrm_membership.join_date AS member_since
            , civicrm_membership_type.name AS member_type
            , (SELECT SUM(civi_cpd_activities.credits) FROM civi_cpd_activities WHERE civi_cpd_activities.contact_id = civicrm_contact.id AND EXTRACT(YEAR FROM civi_cpd_activities.credit_date) = " . $_SESSION['report_year'] . ") As total_credits    
            FROM civicrm_contact 
    
            INNER JOIN civicrm_membership
            ON civicrm_contact.id = civicrm_membership.contact_id
    
            INNER JOIN civicrm_membership_type
            ON civicrm_membership.membership_type_id = civicrm_membership_type.id
    
            WHERE civicrm_contact.first_name IS NOT NULL 
            AND civicrm_contact.last_name IS NOT NULL
            ORDER BY civicrm_contact.last_name
    By the way, the activities table conatians 17,000 rows of data from the last 12 years so there is some work for the database. The query when run without looking at the activities table takes well under a second to run:

    Showing rows 0 - 29 ( 634 total, Query took 0.0075 sec)

    With the sub-query:

    Showing rows 0 - 29 ( 634 total, Query took 7.3675 sec)

    I just had an idea... What about creating a temp table of only the activities results from the year in question and then running the query against that table.
    Andrew Wasson | www.lunadesign.org
    Principal / Internet Development

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,247
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Code:
    SELECT c.id
         , c.last_name
         , c.first_name
         , c.external_identifier
         , c.user_unique_id
         , m.membership_type_id
         , m.id AS membership_id
         , m.join_date AS member_since
         , mt.name AS member_type
         , a.sum_credits 
      FROM civicrm_contact AS c
    INNER 
      JOIN civicrm_membership AS m
        ON m.contact_id = c.id
    INNER 
      JOIN civicrm_membership_type AS mt
        ON mt.id = m.membership_type_id
     WHERE c.first_name IS NOT NULL 
       AND c.last_name IS NOT NULL
    LEFT OUTER
      JOIN ( SELECT pda.contact_id
                  , SUM(pda.credits) AS sum_credits
               FROM professional_development_activities AS pda
              WHERE pda.credit_date >= '2014-01-01'
                AND pda.credit_date  < '2015-01-01'
             GROUP 
                 BY pda.contact_id ) AS a
        ON a.contact_id = c.id
    ORDER 
        BY c.last_name
    notice the two dates -- you would replace these with run-time parameters just like you used to have a parameter for "report_year"

    the idea is that pda.credit_date would have an index, and this range test (starting with first day of year, up to but not including first day of next year) will allow the index to be utilized, whereas your EXTRACT function would not
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Wizard bronze trophy
    Join Date
    Oct 2001
    Location
    Vancouver BC Canada
    Posts
    2,030
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Fantastic! Thanks Rudy. I'll run this and see how it improves the speed of things.

    Cheers,
    Andrew
    Andrew Wasson | www.lunadesign.org
    Principal / Internet Development

  6. #6
    SitePoint Wizard bronze trophy
    Join Date
    Oct 2001
    Location
    Vancouver BC Canada
    Posts
    2,030
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Holy smokes, what a difference! I had to move the WHERE clause to the end of the query just before the ORDER BY but the query runs lightning fast:

    Showing rows 570 - 599 ( 634 total, Query took 0.0282 sec)

    So if I understand correctly the EXTRACT function prevented my query from being able to create or use an index. Is that right? If so, that is definitely a function to be weary of.

    My next trick will be to loop out the total credits for each category in the activities table for this time period. I'll give that a go on my own. Hopefully I won't have to come running for help and I'll post my results when I have something. This is amazingly fast now!
    Andrew Wasson | www.lunadesign.org
    Principal / Internet Development

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,247
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by awasson View Post
    I had to move the WHERE clause to the end of the query
    doh!!

    Quote Originally Posted by awasson View Post
    So if I understand correctly the EXTRACT function prevented my query from being able to create or use an index. Is that right?
    queries do not create indexes, you have to do that yourself, using ALTER TABLE ADD INDEX or CREATE INDEX

    using ~any~ function on a column will negate the use of any index on that column

    did your credit_date column already have an index, or did you just create one now?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •