Testing efficiency of code, mysql queries etc

I’m trying to evaluate a couple of different ways of pulling data from the database and displaying it using PHP, and want to see which is the most efficient.

To test which is most efficient I’ve put both pieces of code in an array, then run a loop 500 times executing both pieces of code in a random order, and saving the execution time each time into a csv file.

However, I think there must be something wrong with my testing method as the times saved indicate that the SQL query with an extra JOIN is fastest, and the rest of the code for one method is consistently faster than the other even though the code is almost identical between both methods.

Here’s the code:

<?php
include('globalFuncs.inc.php');

$sql = array(array(),array());

$sql[0]['standard'] = "SELECT images_2.id, images_2.Headline, images_2.fileext, images_2.Rating, images_2.url_name,
        SUBSTRING_INDEX(imageData.ImageDescription, '\
', 1) AS summary, imageData.GPSLatitude, imageData.GPSLongitude, imageData.GPSAltitude, DATE_FORMAT(imageData.Last_updated,'%Y-%m-%dT%TZ') AS Last_updated,
        CONCAT('<category term=\\"', GROUP_CONCAT(keywords.Subject SEPARATOR '\\" />\
<category term=\\"'), '\\" />') AS tags
        FROM images_2
        LEFT JOIN imageData ON images_2.id = imageData.id
        LEFT JOIN img_keywords ON images_2.id = img_keywords.img_id
        LEFT JOIN keywords ON keywords.id = img_keywords.keywords_id
        WHERE images_2.id IN(SELECT DISTINCT img_categories.img_id
        FROM img_categories
        LEFT JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = 35)
        GROUP BY images_2.id
        ORDER BY images_2.id DESC
        LIMIT 30,30";
$sql[1]['standard'] = "SELECT images_3.id, images_3.Headline, fileType_2.Ext, images_3.Rating, images_3.url_name,
        SUBSTRING_INDEX(imageData.ImageDescription, '\
', 1) AS summary, imageData.GPSLatitude, imageData.GPSLongitude, imageData.GPSAltitude, DATE_FORMAT(imageData.Last_updated,'%Y-%m-%dT%TZ') AS Last_updated,
        CONCAT('<category term=\\"', GROUP_CONCAT(keywords.Subject SEPARATOR '\\" />\
<category term=\\"'), '\\" />') AS tags
        FROM images_3
        LEFT JOIN imageData ON images_3.id = imageData.id
        LEFT JOIN img_keywords ON images_3.id = img_keywords.img_id
        LEFT JOIN keywords ON keywords.id = img_keywords.keywords_id
        LEFT JOIN fileType_2 ON fileType_2.id = images_3.fileext
        WHERE images_3.id IN(SELECT DISTINCT img_categories.img_id
        FROM img_categories
        LEFT JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = 35)
        GROUP BY images_3.id
        ORDER BY images_3.id DESC
        LIMIT 30,30";

$sql[0]['createImgTag'] = function($data){
        $str = '';
        foreach($data as $img){
            $str .= '<img src="'.STATIC2.'/Img/thumbs/'.$img['url_name'].$img['fileext'].'" />';
        }
        return $str;
    };
$sql[1]['createImgTag'] = function($data){
        $str = '';
        foreach($data as $img){
            $str .= '<img src="'.STATIC2.'/Img/thumbs/'.$img['url_name'].$img['Ext'].'" />';
        }
        return $str;
    };

$keys=array(0,1);
$runs=500;
$csv = array();
//Test standard query and creating img tag
for($i=0;$i<$runs;$i++){
    shuffle($keys);
    foreach($keys as $key){
    //Test standard query
        $start = microtime(true);
        if($result = $conn->query($sql[$key]['standard'])){
            $data = array();
            while($row = $result->fetch_assoc()){
                $data[] = $row;
            }
            $result->close();
            $end = microtime(true);
        }
        else{echo $conn->error; exit();}
        
        $csv[$i][$key] = number_format($end - $start, 20);
        
        //Test creating img tag
        $start = microtime(true);
        /*echo*/ $sql[$key]['createImgTag']($data);
        $end = microtime(true);
        $csv[$i][$key+2] = number_format($end - $start, 20);
        //echo '<br />Query for '.$key.' method took '.$csv[$i][$key].'s + creating tage took '.$csv[$i][$key+2].'<br /><br />';
    }
}

//Dump data
array_unshift($csv, array('standard 0', 'standard 1', 'createImgTag 0', 'createImgTag 1'));
$file= fopen('./resultsv3.csv', 'w');
foreach($csv as $fields){
    fputcsv($file, $fields);
}
fclose($file);

And it’s giving me results like:

"standard 0","standard 1","createImgTag 0","createImgTag 1"
0.02683115005493164062,0.00004792213439941406,0.01996994018554687500,0.00003910064697265625
0.02115297317504882812,0.00003600120544433594,0.01080298423767089844,0.00003600120544433594

Can anyone see anything wrong with my testing method or suggest a better way of comparing efficiency of code snippets / queries?

Thanks

Dave

The better thing to do would be to insert a lot of filler data into the database to see how your queries scales. Generally with small amount of data - the speed is trivial as MySQL is pretty smart in itself.

Thanks for the reply, I’m not quite sure what you mean? Do you mean that the test shouldn’t be running the exact same query each time, but should give a different result set each time it is run for a proper test (e.g. changing the LIMIT for each run)?

Any ideas why the PHP string concatenation is running significantly faster for one method than the other as well?

I checked the outputs of both queries using var_dump and they are both exactly the same except for an extra newline in the <category> tags string. E.g. one query gives

["tags"]=>
    string(732) "<category term="Biota" />
<category term="Life" />
<category term="Vitae" />
<category term="Eukaryota" />
<category term="Animalia" />
<category term="Animals" />
<category term="Arthropoda" />
<category term="Arthropods" />
<category term="Insecta" />
<category term="Insects" />
<category term="Lepidoptera" />
<category term="Moths" />
<category term="Butterflies" />
<category term="Skippers" />
<category term="Pieridae" />
<category term="Sulphurs &amp; Whites" />

<category term="Muscari" />
<category term="Grape Hyacinth" />
<category term="Anthocharis" />
<category term="Orangetips" />
<category term="Anthocharis cardamines" />
<category term="Orange Tip" />
<category term="Wings closed" />
<category term="Resting" />"

while the other gives

["tags"]=>
    string(732) "<category term="Biota" />
<category term="Life" />
<category term="Vitae" />
<category term="Eukaryota" />
<category term="Animalia" />

<category term="Animals" />
<category term="Arthropoda" />
<category term="Arthropods" />
<category term="Insecta" />
<category term="Insects" />
<category term="Lepidoptera" />
<category term="Moths" />
<category term="Butterflies" />
<category term="Skippers" />
<category term="Pieridae" />
<category term="Sulphurs &amp; Whites" />
<category term="Muscari" />
<category term="Grape Hyacinth" />
<category term="Anthocharis" />
<category term="Orangetips" />
<category term="Anthocharis cardamines" />
<category term="Orange Tip" />

<category term="Wings closed" />
<category term="Resting" />"

I’m not sure where these extra newlines are coming from, and var_dump lists both strings as being the same length anyway, so I wouldn’t have thought it would affect anything.

I beleive that wonshikee meant that you should test your queries on a database which has a couple of millions of rows instead on database which has a few thousands of rows.

Okay, I have filled my db with >2,000,000 dummy data rows now and tested again. But still the same results.

I will post the full code of the test below, but what I would expect is:
Simple query: Method 0 and 1 should be roughly the same speed. Method 2 should be slower as an extra join.
Standard query: Same as simple.
Create Img tags: Method 0 should be fastest as less concatenation for PHP to do, Methods 1 and 2 should be slower than Method 0, and about the same speed as each other.

What the results show:
Simple query: All 3 are roughly the same
Standard query: Method 0 and 2 are roughly the same with Method 2 significantly faster
Create Img tags: Method 0 is fastest, followed by Method 2, with Method 1 significantly slower

Avg results:

simple0                 simple 1                  simple 2	
0.000507132053375244    0.00057588529586792       0.000539646148681641

standard 0              standard 1                standard 2
0.00796413707733154     0.0000394973754882812     0.00803724241256714

createImgTag 0          createImgTag 1            createImgTag 2
0.0000391683578491211   0.00804863357543945       0.0000496020317077637

I find this quite unlikely, so it seems there is probably something wrong in my test script / testing methodology. Can anyone spot anything wrong with my test?

<?php
include('globalFuncs.inc.php');

$sql = array(array(),array(),array());
$sql[0]['simple'] = 'SELECT id, Headline, Rating, filename, url_name FROM images ORDER BY Rating DESC LIMIT 0,30';
$sql[1]['simple'] = 'SELECT id, Headline, Rating, filename, url_name FROM images_2 ORDER BY Rating DESC LIMIT 0,30';
$sql[2]['simple'] = 'SELECT images_3.id, images_3.Headline, images_3.Rating, fileType_2.Ext, images_3.url_name FROM images_3
        LEFT JOIN fileType_2 ON fileType_2.id = images_3.filename ORDER BY Rating DESC LIMIT 0,30';
        
$sql[0]['standard'] = "SELECT images.id, images.Headline, images.filename, images.Rating, images.url_name,
        SUBSTRING_INDEX(imageData.ImageDescription, '\
', 1) AS summary, imageData.GPSLatitude, imageData.GPSLongitude, imageData.GPSAltitude, DATE_FORMAT(imageData.Last_updated,'%Y-%m-%dT%TZ') AS Last_updated,
        CONCAT('<category term=\\"', GROUP_CONCAT(keywords.Subject SEPARATOR '\\" />\
<category term=\\"'), '\\" />') AS tags
        FROM images
        LEFT JOIN imageData ON images.id = imageData.id
        LEFT JOIN img_keywords ON images.id = img_keywords.img_id
        LEFT JOIN keywords ON keywords.id = img_keywords.keywords_id
        WHERE images.id IN(SELECT DISTINCT img_categories.img_id
        FROM img_categories
        LEFT JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = 35)
        GROUP BY images.id
        ORDER BY images.id DESC
        LIMIT 30,30";
$sql[1]['standard'] = "SELECT images_2.id, images_2.Headline, images_2.filename, images_2.Rating, images_2.url_name,
        SUBSTRING_INDEX(imageData.ImageDescription, '\
', 1) AS summary, imageData.GPSLatitude, imageData.GPSLongitude, imageData.GPSAltitude, DATE_FORMAT(imageData.Last_updated,'%Y-%m-%dT%TZ') AS Last_updated,
        CONCAT('<category term=\\"', GROUP_CONCAT(keywords.Subject SEPARATOR '\\" />\
<category term=\\"'), '\\" />') AS tags
        FROM images_2
        LEFT JOIN imageData ON images_2.id = imageData.id
        LEFT JOIN img_keywords ON images_2.id = img_keywords.img_id
        LEFT JOIN keywords ON keywords.id = img_keywords.keywords_id
        WHERE images_2.id IN(SELECT DISTINCT img_categories.img_id
        FROM img_categories
        LEFT JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = 35)
        GROUP BY images_2.id
        ORDER BY images_2.id DESC
        LIMIT 30,30";
$sql[2]['standard'] = "SELECT images_3.id, images_3.Headline, fileType_2.Ext, images_3.Rating, images_3.url_name,
        SUBSTRING_INDEX(imageData.ImageDescription, '\
', 1) AS summary, imageData.GPSLatitude, imageData.GPSLongitude, imageData.GPSAltitude, DATE_FORMAT(imageData.Last_updated,'%Y-%m-%dT%TZ') AS Last_updated,
        CONCAT('<category term=\\"', GROUP_CONCAT(keywords.Subject SEPARATOR '\\" />\
<category term=\\"'), '\\" />') AS tags
        FROM images_3
        LEFT JOIN imageData ON images_3.id = imageData.id
        LEFT JOIN img_keywords ON images_3.id = img_keywords.img_id
        LEFT JOIN keywords ON keywords.id = img_keywords.keywords_id
        LEFT JOIN fileType_2 ON fileType_2.id = images_3.filename
        WHERE images_3.id IN(SELECT DISTINCT img_categories.img_id
        FROM img_categories
        LEFT JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = 35)
        GROUP BY images_3.id
        ORDER BY images_3.id DESC
        LIMIT 30,30";

$sql[0]['createImgTag'] = function($data){
        $str = '';
        foreach($data as $img){
            $str .= '<img src="'.STATIC2.'/Img/thumbs/'.$img['filename'].'" />';
        }
        return $str;
    };
$sql[1]['createImgTag'] = function($data){
        $str = '';
        foreach($data as $img){
            $str .= '<img src="'.STATIC2.'/Img/thumbs/'.$img['url_name'].$img['filename'].'" />';
        }
        return $str;
    };
$sql[2]['createImgTag'] = function($data){
        $str = '';
        foreach($data as $img){
            $str .= '<img src="'.STATIC2.'/Img/thumbs/'.$img['url_name'].$img['Ext'].'" />';
        }
        return $str;
    };

$keys=array(0,1,2);
$runs=500;
$csv = array();

//Test simple query
for($i=0;$i<$runs;$i++){
    shuffle($keys);
    $csv[$i]=array();
    foreach($keys as $key){
        $start = microtime(true);
        if($result = $conn->query($sql[$key]['simple'])){
            $data = array();
            while($row = $result->fetch_assoc()){
                $data[] = $row;
            }
            $result->close();
        }
        else{echo $conn->error; exit();}
        $end = microtime(true);
        $csv[$i][$key] = number_format($end - $start, 50);
    }
    
}

//Test standard query and creating img tag
for($i=0;$i<$runs;$i++){
    shuffle($keys);
    foreach($keys as $key){
    //Test standard query
        $start = microtime(true);
        if($result = $conn->query($sql[$key]['standard'])){
            $data = array();
            while($row = $result->fetch_assoc()){
                $data[] = $row;
            }
            $result->close();
        }
        else{echo $conn->error; exit();}
        $end = microtime(true);
        $csv[$i][$key+3] = number_format($end - $start, 50);
        
        //Test creating img tag
        $start = microtime(true);
        $sql[$key]['createImgTag']($data);
        $end = microtime(true);
        $csv[$i][$key+6] = number_format($end - $start, 50);
        echo '<br />Query for '.$key.' method took '.$csv[$i][$key+3].'s + creating tags took '.$csv[$i][$key+6].'<br /><br />';
    }
}

//Dump data
array_unshift($csv, array('simple 0', 'simple 1', 'simple 2', 'standard 0', 'standard 1', 'standard 2', 'createImgTag 0', 'createImgTag 1', 'createImgTag 2'));
$file= fopen('./resultsv2.csv', 'w');
foreach($csv as $fields){
    fputcsv($file, $fields);
}
fclose($file);

how about you show us what indexes you have on these tables?

Good point, I didn’t think of checking the indexes.

A bit of background to the test - the table images contains a field url_name and another field filename. The filename is just the url_name with a file extension added, so I want to test to see what is most efficient:
Method 0 - storing the filename and url_name in the db
Method 1 - storing the file extension and url_name in the db and then concatenating them in php to create the filename
Method 2 - as method 1 except storing the file extension in a separate table

So for the test I took the table images used for Method 0, and then did CREATE TABLE LIKE images to create the other two tables, copied the data from images to them, and modified them as needed for the test.

CREATE TABLE `images` (
 `id` int(10) unsigned NOT NULL,
 `Headline` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `Rating` float DEFAULT NULL,
 `filename` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `url_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `fk_images_imageData` (`id`),
 KEY `Rating` (`Rating`),
 CONSTRAINT `fk_images_imageData` FOREIGN KEY (`id`) REFERENCES `imageData` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `images_2` (
 `id` int(10) unsigned NOT NULL,
 `Headline` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `Rating` float DEFAULT NULL,
 `filename` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `url_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `fk_images_imageData` (`id`),
 KEY `Rating` (`Rating`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `images_3` (
 `id` int(10) unsigned NOT NULL,
 `Headline` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `Rating` float DEFAULT NULL,
 `filename` tinyint(4) NOT NULL,
 `url_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `fk_images_imageData` (`id`),
 KEY `Rating` (`Rating`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

So the keys are the same on each table, though the table used for method 0 has a foreign key constraint on it. I don’t think this should make any difference to SELECT queries though?

The table fileType_2 used in Method 2 looks like this:

CREATE TABLE `fileType_2` (
 `id` int(2) unsigned NOT NULL AUTO_INCREMENT,
 `FileType` varchar(45) COLLATE utf8_unicode_ci NOT NULL,
 `MIMEType` varchar(45) COLLATE utf8_unicode_ci NOT NULL,
 `Ext` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `UNIQUE` (`FileType`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

All other tables are the same in all queries, so I don’t think indexes on them would cause any differences in speed between the queries:[LIST]
[]imageData has PRIMARY KEY (id) (and various other keys)
[
]img_categories has PRIMARY KEY (img_id,categories_id), KEY fk_img_categories_images (img_id), KEY fk_img_categories_categories (categories_id),
[]categories has PRIMARY KEY (id), KEY lft (lft), KEY rgt (rgt)
[
]img_keywords has PRIMARY KEY (img_id,keywords_id), KEY fk_img_keywords_imageData (img_id), KEY fk_img_keywords_keywords (keywords_id)
[*]keywords has PRIMARY KEY (id), UNIQUE KEY UNIQUE (Subject)
[/LIST]

After examining the indexes, I can’t see anything there to account for the speed differences between the different queries?

I find this quite unlikely, so it seems there is probably something wrong in my test script / testing methodology. Can anyone spot anything wrong with my test?

Like most databases, mysql does a pretty good job of caching. What you are trying to do is probably not overly useful. Premature optimization.

How responsive is your site in production? If performance is acceptable then move on to something else.

When working with a complex query it is good practice to develop the query outside of PHP and use EXPLAIN to analyze potential opportunities for optimization. This work flow eliminates all the variables introduced by application side code but more importantly takes scalability into account without a bunch of fake data. The only problem is you need to know what to look for in the explain statement. Once you know that which is well worth learning anyway it is a relatively straight forward task for most operations. Most things like looking for full table scanning and temporary tables is pretty easy, which accounts for most potential scale problems. When ever I write a query that is more than a couple or joins or needs to include sub-queries I always develop it outside of PHP until satisfied with the explain analysis. I suggest you learn to do the same if concerned about query performance. Of course this becomes a lot trickier if your working with some type of ORM, but you not from the looks of it so we won’t go there.

You are right to focus on the SQL because problems introduced my application side code are a mere fraction of unoptimized queries. The majority of the bottleneck for any application always relies on external resources such as MySQL. Application side optimizations unless your doing something terribly wrong aren’t going to make or break anything, even double quotes… (not that I condone it) – I’m just saying… If your going to spend time optimizing an application your best to look at queries than converting double quoted strings to single quotes… just saying. I manage several Drupal sites with over 200 modules and site outage are just about always caused by poorly performing queries or excessive database hits.

Great replies.

To nullify the effects of caching I guess there must be a setting somewhere in my.ini that allows you to turn this off - though I am not sure how helpful that would be – especially if your live server has (mysql) caching on.

If you cannot easily find a way of building your sql queries before you application, get used to looking in your mysql log file – you may need to enable this, again in my.ini – I have a shortcut to it on my desktop.

Pick up the query from there and as oddz says, tack EXPLAIN in front of it – that will let you know what indexes are being used, somehow these never matches what my mind tells me should be being used.

Thanks for the advice everyone.

I have found what was wrong with my test now. I don’t think that it was that the db needed 2,000,000 records or that queries were being cached (according to the mysql manual query caching is turned off by default). As I originally thought, it was something wrong with my testing methodology. I had thought that fputcsv would write the array values in order of key. Actually it writes them in order that the key was added to the array. So

$myarray = array();
$myarray[1]='Second record';
$myarray[0]='First record';
$mycsv= fopen('./mycsv.csv', 'w');
fputcsv($mycsv,$myarray);
fclose($mycsv);

Will result in the csv

"Second record","First record"

I have now re-run the test, but using ksort($fields); before writing each line of the csv. It still doesn’t give quite the results I was expecting (there is not much difference between any method), but the results are at least believable now.

I will certainly have a look at EXPLAIN for optimising my queries as well, thanks for the suggestions.