that’s a highly contrived example, and i am at a loss to understand why they would use that example to illustrate a benchmark
ask yourself what the query is supposed to return, whether the subquery version or the join version (hint: they produce the same result)
i can easily explain why the subquery version is faster – it’s because an EXISTS subquery can be optimized, whereas the join, coupled with the use of DISTINCT, retrieves tons of extraneous rows, only to throw away most of them
still wondering what the purpose of the query is?
another type of subquery is called a correlated subquery, and it is notoriously inefficient as compared to a join, but this example with the films and the actors doesn’t use a correlated subquery
you should not draw any general conclusions about subqueries versus joins, as you did in your first post, such as “preferential to use a subquery over join queries that will create a temporay table” (aside: there is no temporary table involved here)
okay, let’s get back to the purpose of the query
there are 1000 films in the films table, and 5642 actors in the film_actors table, right?
have you figured out what the purpose of the query is yet?
find all films that have at least one actor
how useless is that?? we are told that the query returns 997 rows, which isn’t at all surprising (it means there are only 3 films that have no actor at all)
however, i’m sure you will agree that this is quite the contrived example – in real life, it would be much more likely that you would want to find the films that don’t have an actor, rather than to list all the ones that do
please, don’t draw any conclusions from it – other than, yes, an EXISTS subquery is quite efficient
bear in mind, though, that EXISTS subqueries are rare, and not always this contrived
