Too many queries and search load time is mega slow

Mal1 · December 7, 2012, 2:14pm

Will JOIN improve performance?

The Rand was put in so when someone does a search they’re not getting the same order of results back every time which would get boring if they come back to the site (going this slow it’s doubtful!)… it would be good to be able to randomise the results but no, I guess it’s not really needed.

I did remove that section of code the other day though and it didn’t look it sped things up much

Mal1 · December 7, 2012, 2:55pm


$base_sql = "SELECT DISTINCT rugs.id as id, 
              (SELECT $agg((IF(price IS NULL OR discount_price IS NULL, 
        COALESCE(price, discount_price), LEAST(price, discount_price)))) FROM rugs_stock WHERE rugs_stock.rug_id = rugs.id) as actual_price
             FROM rugs, rug_types, rug_fabrics, rug_patterns, rug_subtypes, 
                  rug_shape, rug_designer, rug_designnames, 
                  rug_colours as colours1, rug_colours as colours2, rug_colours as colours3, rugs_stock as stock ";

$count_sql = "SELECT COUNT(DISTINCT rugs.id, 
              (SELECT $agg((IF(price IS NULL OR discount_price IS NULL, 
        COALESCE(price, discount_price), LEAST(price, discount_price)))) FROM rugs_stock WHERE rugs_stock.rug_id = rugs.id)) as actual_price
             FROM rugs, rug_types, rug_fabrics, rug_patterns, rug_subtypes, 
                  rug_shape, rug_designer, rug_designnames, 
                  rug_colours as colours1, rug_colours as colours2, rug_colours as colours3, rugs_stock as stock ";
$sql .= "WHERE ";
if ( !isset($_GET['admin-search']) && empty($_GET['admin-search']) )
  $sql .= " rugs.active = '1' AND ";
$sql .= "          rugs.type        = rug_types.id 
               AND rugs.subtype     = rug_subtypes.id 
               AND rugs.fabric      = rug_fabrics.id 
               AND rugs.pattern     = rug_patterns.id 
               AND rugs.shape       = rug_shape.id 
               AND rugs.designer    = rug_designer.id 
               AND rugs.design_name = rug_designnames.id 
               AND rugs.colour1     = colours1.id                                    
               AND rugs.colour2     = colours2.id                                    
               AND rugs.colour3     = colours3.id
               AND rugs.id          = stock.rug_id                                    
                ";

If it’s the Join/Where issue then the problem is located here? ^^

Mal1 · December 7, 2012, 3:43pm

Ok… pulling my hair out here.

I know it’s not 100% possible to tell but roughly how long do you think it would take to fix this to an acceptable level? And what would you recommend as the best way to go around finding a freelance programmer. I’d really prefer to be able to sit in the same room as them while they work so there’s no communication issues and because we would be able to assist in making sense of the messy code as we know the website and functions. So ideally it would be Glasgow based.

Jeff_Mott · December 7, 2012, 7:28pm

@aaarrrggh hit the nail on the head… twice. The main culprit is the “order by rand”. I re-created your schema on my machine and filled it with dummy data so that I could actually test changes to the query. Just deleting order by rand yielded a 97% speed boost.

SpacePhoenix · December 7, 2012, 7:54pm

If you need x number of random rows from the result set, do the queries as normal but don’t use an ORDER BY RAND, instead, once you’ve got the result set into an array, use array_rand() to pick the number of random entries needed. If you need all the results but in a random order then use shuffle() on the result set.

Mal1 · December 7, 2012, 11:51pm

Removed the RAND section all together for now and it’s sped up the search by about half which is good but still a bit slow.

Not been able to use JOIN yet - should that see a speed increase?

StarLion · December 8, 2012, 12:38am

I would need a database person’s expertise to back me up, but to my head it SHOULD do this;

I do an implicit join (SELECT … FROM tablea,tableb,tablec…) I then have to inner join EVERY row of table A and table B. I then inner join that combined table with table C.
Then i take the entire thing, and THEN pair down the results based on the WHERE.

If instead i explicitly define my join conditions (SELECT … FROM tablea INNER JOIN tableb ON tablea.fileid = 10 AND tablea.bjoin = tableb.ajoin INNER JOIN tablec ON tablea.cjoin = tablec.ajoin …)
Now my first join only holds rows where fileid is 10. This should make the second (and subsequent) joins faster, because there are less records to join to.

Now, i could be completely talking out of my rear, but thats how it works in my head

Mal1 · December 10, 2012, 12:18pm

Again, would I see significant speed improvements?

The websites are on a shared server at the moment and we’re thinking of switching to a dedicated server. My issue is that our other (original) website works fine and searches are almost instant (there are difference in database tables and therefore the way searches are carried out). The new site is really slow BUT they’re both on the same server - so I don’t know how much of a benefit moving to a dedicated server would provide.

SpacePhoenix · December 10, 2012, 8:11pm

Can you please post the whole, entire “monster query” as it currently stands and the output of a SHOW CREATE TABLE for each of the tables that the query uses?

What information is required to be output?

Approximately how many records are there in the database you’re testing it against?

system · December 10, 2012, 8:24pm

Yeah… is it at all possible for you to even dump some of the tables complete with their data so we can do some testing?

If the tables are massive (which they probably are if we’re dealing with a query that takes a long time), or if the tables contain any sensitive data then scrap that idea and just give us the structure of the tables and an idea of how much data is contained in each table…

Cheers

DaveMaxwell · December 10, 2012, 8:42pm

StarLion:

I would need a database person’s expertise to back me up, but to my head it SHOULD do this;

I do an implicit join (SELECT … FROM tablea,tableb,tablec…) I then have to inner join EVERY row of table A and table B. I then inner join that combined table with table C.
Then i take the entire thing, and THEN pair down the results based on the WHERE.

If instead i explicitly define my join conditions (SELECT … FROM tablea INNER JOIN tableb ON tablea.fileid = 10 AND tablea.bjoin = tableb.ajoin INNER JOIN tablec ON tablea.cjoin = tablec.ajoin …)
Now my first join only holds rows where fileid is 10. This should make the second (and subsequent) joins faster, because there are less records to join to.

Now, i could be completely talking out of my rear, but thats how it works in my head

I had to look it up to ensure research hadn’t come out with some different results, but I can’t find any…

There’s no negligible performance difference between the two syntaxs. The only real difference between the two is the DBMS will convert the implicit join syntax to the explicit before executing. Otherwise it’s exactly the same.

The savings is on having someone look at your sql code - it’s faster to establish the PK/FK combinations, PLUS it’s easier to ensure the join actually occurs and you’re not returning ALL rows from both tables (that happens a lot more than you would think…)

DaveMaxwell · December 10, 2012, 8:53pm

Looking at your generated SQL, that subquery looks expensive. I have a feeling that might be what’s causing you issues.

Generating the resultset FIRST, then using that resultset as a JOINed table to group by and get the price might be a better approach. But I’d need data to see for sure.