
Originally Posted by
TomB
How many queries did you run? You should have run 1 query to fetch the products and one for the manufacturer of each product. The result being N+1 queries.
Yes, in my benchmark I did N+1 queires, which was 1001 queries. 1001 simple queries were 2.5x slower than a single LEFT JOIN to fetch the same data from two tables. This was on my local installation so there were no network latencies.

Originally Posted by
TomB
Imagine you want to sort the initial product list. Add ORDER BY product.name to that or ORDER BY RAND() and your single query will be immensely slower because it's sorting the entire data set. Move the sort to array_shuffle($products) and it's suddenly much faster.
Yes, ordering by product name was slightly slower but that's understandable because it's a text field and not a number (price). In this case the separate 1001 were 2x slower than 1 LEFT JOIN. Still JOIN was much better.

Originally Posted by
TomB
This is scaled by the number of tables you are joining.
Okay, just out of curiosity I added another LEFT JOIN that fetched product's availability - a row from a small lookup table joined by a numeric availability_id:
Code:
SELECT p.*, m.*, a.*
FROM products p
LEFT JOIN manufacturer m
ON m.id=p.id
LEFT JOIN availability a
ON a.availability_id=p.availability_id
WHERE p.hidden=0
ORDER BY p.name
LIMIT 1000;
The time for the single JOIN query became very marginally slower (almost the same as with 1 JOIN) but fetching the same data in separate queries became much longer - understandable since now there are 2001 queries to run. This time the single query performed 3.5x faster than the separate queries.
Okay, your scaling argument holds true for INNER JOINs, though. But still the single query was 2x faster. However, with a little bit of tweaking I was able to run the INNER JOINs as fast as LEFT JOINs - by adding index to product.name and adding FORCE INDEX(prod_name) to the query.
I don't see that joins are as slow as you're making them to be. Sure, there will be cases where a join will be inferior but it's not that often, and if that happens there's still room for improvement by optimising, adding indexes, etc. And when a large query takes too much time it's often enough to take 1 or 2 joins into separate tables to get rid of performance problems. I'm not trying to be argumentative to prove joins are the way to go - I was curious myself how they perform, that's why I did the above benchmarks. Another thing - recent versions of Mysql have had many performance improvements and the same joins you found slow many years ago might run much faster now.

Originally Posted by
TomB
That's fair enough, but we recently had an interesting topic on premature optimisation and that's exactly what it sounds like to me. The problem is you're sacrificing flexibility in the name of performance. Fixing performance problems can easily be done after the fact with a host of different methods. Fixing flexibility problems generally involves re-writing a lot of code because the code isn't flexible enough to handle anything outside what it was initially designed to do.
It's very subjective where premature optimisation starts. Personally I prefer not to optimise too much but still to keep performance in mind when developing a web site. In other words use the convenience of objects whenever possible but still try to implement solutions that will perform fairy fast. Depending on a project the flexibility of being able to replace the whole data source may not be needed at all.
If you develop the whole system using perfect objects oriented approach with an ORM or a mapper for data access then how are you going to optimise it if you stumble accross a problem like cpradio described - the network latency is too high and you need to reduce the number of queries to 1 or 2? Your mapper can't do it because it hasn't enough knowledge for that, then you need to get out of your way and introduce ugly hacks giving up consistency and flexibility that your objects provide - because now you need to run a single big query instead of hundreds of small ones. If the system is prepared for this from the ground up then it's very easy task.
It looks like you have one kind of flexibility at the expense of another one - the flexibility to optimise database access. There's nothing wrong with it but it only proves that there's no ideal system that can be 100% flexible in every aspect.

Originally Posted by
TomB
Yes. Any time an individual record is fetched, it's stored in the mapper. Individual mappers can turn this off but it's something I rarely need to use. Instead I have it so that certain fields (e.g. longblob) can be lazy loaded.
I did play around with caching results from queries that returned multiple results but I found that in the real world, 99% of the time these are only used once whereas fetching by ID, in addition to being a lot less data to store in memory, is often something that ends up being used multiple times. If I want a product manufacturer name, it's likely I'll also want that product's manufacturer logo. So in that example, the user would be cached in memory for the duration of the script.
Ok, thanks for the description!
Bookmarks