I have a question how to have two JOINs using the same table.
The previous sentence probably doesn’t make sense, so let me try to explain using an example. Let say, I have two tables
select buyer.name as buyer,
seller.name as seller
from users as buyer
join transactions
on buyer.userid = transactions.buyerid
join users as seller
on transaction.sellerid = seller.userid
where transactions.price > 10
i prefer to write the tables in the FROM clause in a logical sequence
SELECT buyer.name AS buyer
, seller.name AS seller
FROM transactions
INNER
JOIN users AS buyer
ON buyer.userid = transactions.buyerid
INNER
JOIN users AS seller
ON seller.userid = transactions.sellerid
WHERE transactions.price > 10
i always try to start the FROM clause with the “driving” table, i.e. the one mentioned in the WHERE clause which has the main restriction, the one which the other tables are joined to
I actually would like to ask a second question and that is about the efficiency of JOIN statements.
Let say, the transaction table has many records. And I would like to get the buyer and seller name for a specific transaction (eg transaction 123456). As I have been explained, I can do this in one query:
SELECT
buyers.name AS buyer,
sellers.name AS seller
FROM
transactions
INNER JOIN users AS buyers
ON transactions.buyerid = buyers.userid
INNER JOIN users AS sellers
ON transactions.sellerid = sellers.userid
WHERE
transactions.transactionid = 123456
Now, I would “visualize” this query as a large table being constructed (with buyer and seller names for all transactions) to which the WHERE clause is being applied.
If the database would indeed process the query like that, it would be very inefficient. In that case it would be better to obtain the buyerid and sellerid, and get the names in separate query.
Does anyone know if one should take these kind of considerations into account? Or can you expect the database to optimize the query?
the reason i asked which database you’re using is because we have a mysql forum (for mysql, eh) and a databases forum (for everything else), and you didn’t post in the mysql forum and you also didn’t say which database you’re using
in your query, put the word EXPLAIN in front of the word SELECT, run it, and show us the results
Ok, so I have created the following php script to populate the tables with some random data
$seed = 100;
srand($seed);
// Fill users table with 1000 users
for ($i = 0; $i < 1000; $i++) {
$name = rand(0, 100000);
$sql = "INSERT INTO users (name) VALUES ('$name')";
mysql_query($sql, $con);
}
// Fill transactions table with 200k transactions
for ($i = 0; $i < 200000; $i++) {
$buyerid = rand(0, 999);
while (($sellerid = rand(0, 999)) == $buyerid);
$price = rand(1, 100);
$sql = "INSERT INTO transactions (buyerid, sellerid, price) ";
$sql .= "VALUES ($buyerid, $sellerid, $price)";
mysql_query($sql, $con);
}
Now executing the SELECT statement with two JOINs returns buyer=72396 (corresponding to userid=148) and seller=88733 (corresponding to userid=184) pretty instantaneously.
And EXPLAIN gives an output as per screenshot.
Based on the speed of returning the result I would think MySQL is optimizing.
And, then I would assume any other respectable database system would do the same.