3 More Joins You Should Be Familiar With
There are many ways to JOIN data from two database tables and filter the information you require. Craig Buckler wrote a popular piece on understanding JOINs; namely INNER, LEFT, RIGHT, and FULL OUTER. This article is an extension of that one.
Let’s recap these real quick. Picture two tables, one for customers
and one for books
to establish a book loan database.
customers
table
id | firstname | lastname | book_id |
---|---|---|---|
1 | Joe | Blow | 1 |
2 | Jane | Doe | 2 |
3 | Harry | Crow | 2 |
4 | Jeffrey | Snow | 0 |
books
table
id | Title |
---|---|
1 | Star Wars |
2 | Jurassic Park |
3 | Little Women |
4 | Tom Sawyer |
The books
table has one row for every book.
The customers
table has one row for each customer who can only have one book on loan at a time. If they have no book on loan, the book_id
would be 0
or an empty string.
This is a very simple example to make the JOINs as clear as possible to understand!
A LEFT JOIN here would be in the case you want to ask a question such as “show me all customers including any books on loan.”
You can see in the image that ALL data in the left circle, or table, is included in the result set. Only data that overlaps from the books table is included from the right table. This means that with a LEFT JOIN
, some data in the right table may be excluded.
A RIGHT JOIN would be like asking “show me all the books in my library, along with any customers that have borrowed them.”
In this image you see that ALL data in the right table is included in the result set. Only data that overlaps from the customers
table is included. This means that with a RIGHT JOIN
, some data in the left table may be excluded.
An OUTER JOIN would be like asking “show me all loans and all books, regardless of connections between them.”
In this image, you can see that ALL data from both tables will be included, regardless of whether some data overlaps. When a query like this is done, you will have non-overlapping data in the result, and those fields will be set to NULL.
An INNER JOIN would be like asking “show only customers with a loan.”
Here you can see that data may be excluded from both the left and right tables. You won’t see any customers if they DON’T have a book out, and you won’t see any books if they are NOT loaned out!
This is the most common type of data and is the default behavior when using the keyword JOIN
by itself. The added word “INNER” is not usually required.
What Does “Left” And “Right” Mean Anyway?
You can think of “left” and “right” as nothing more than source order. Look at this query:
SELECT *
FROM customers
LEFT JOIN books
ON customers.book_id = books.id
Notice that I mentioned the table customers
before I used the JOIN
keyword. This means customers
is my “left” table. Another way of thinking about it is to ask which table is left of the JOIN
keyword, and which is to the right of it.
A couple caveats:
- Simply because one table is the “left” one, does not mean you are selecting ALL its records. This is the function of
LEFT JOIN
orRIGHT JOIN
, not source order! - The table that is the left one will have its columns listed first in the result set unless you specifically choose columns in the
SELECT
section. - The order of tables after the
ON
keyword doesn’t matter, it would return the same results if swapped.
Enough Reviewing. What Are These New JOINs?
Well, they are not “new”, they are just a way to query additional questions. How would you find answers to questions with the words “don’t” or “not” in the query?
Show me all the customers who DON’T have books on loan.
Show me all the books that are NOT loaned out.
LEFT JOIN with Exclusion
Take a look at this graphic. Notice the difference from the above JOINs in the shaded area.
This looks like a LEFT JOIN
, but no longer has the “overlap” data included. Why? This is the same as asking “show me customers who have no books on loan.” Perhaps you want to select all customers without a book and send them a newsletter with a special discount?
If you are clever you might think you can just search the customers
table for a book_id
of 0 to do the same thing. That would work in this example but most of the time it won’t; it depends how your tables are designed.
The query looks like this:
SELECT *
FROM customers
LEFT JOIN books
ON customers.book_id = books.id
WHERE books.id IS NULL
We’ve now included a WHERE
clause. Any time you have a WHERE clause, you are excluding data, or filtering it, almost like a search. So why are we searching for books.id
being NULL? Why would that be NULL? Let’s run the standard LEFT JOIN
and see the data it returns, we’ll have the answer:
LEFT JOIN
return data
id | firstname | lastname | book_id | id1 | title |
---|---|---|---|---|---|
1 | Joe | Blow | 1 | 1 | Star Wars |
2 | Jane | Doe | 2 | 2 | Jurassic Park |
3 | Harry | Crow | 2 | 2 | Jurassic Park |
4 | Jeffrey | Snow | 0 | null | null |
Can you see the issue? This is a LEFT JOIN
which means ALL data from the customers
table is included regardless of overlap with books. Jeffrey Snow is included but he does not have a book on loan, this is why the “id1” and “title” columns are set to NULL. When the tables are JOINed, there would not be a book title or book ID linked to him.
If we ask “show me all customers including which books they have”, you would want the above data because Jeffrey is a customer regardless if he has a book on loan or not. That would be the function of a LEFT JOIN
in this case.
If we ask the question “show me customers with no books on loan”, it now makes perfect sense what to look for. We only need to select customers where we see NULL for the books.id
column (when JOINed, would be labeled id1
since there are two columns named “id”). We do this with a standard WHERE
clause if we add WHERE books.id IS NULL
. Now the result is filtered to just this:
id | firstname | lastname | book_id | id1 | title |
---|---|---|---|---|---|
4 | Jeffrey | Snow | 0 | null | null |
You now have all customers who do not have books on loan.
RIGHT JOIN with Exclusion
Let’s do the same with a RIGHT JOIN
. Let’s find all the books which are NOT loaned out to anybody.
A normal RIGHT JOIN
would return every book regardless of whether it is loaned, the result set would look like this:
id | firstname | lastname | book_id | id1 | title |
---|---|---|---|---|---|
1 | Joe | Blow | 1 | 1 | Star Wars |
2 | Jane | Doe | 2 | 2 | Jurassic Park |
3 | Harry | Crow | 2 | 2 | Jurassic Park |
null | null | null | null | 3 | Little Women |
null | null | null | null | 4 | Tom Sawyer |
This looks a bit different. First, you might notice that Jurassic Park is listed twice. This is because two people have the book on loan, and the database is returning a row for each match.
Notice all the corresponding columns from the customers
table are NULL for Little Women and Tom Sawyer because no one borrowed those titles so there is no overlapping data.
If we want to select all the books that are not loaned out, we just use the WHERE
clause to find “NULL” in the customers.id
column.
SELECT *
FROM customers
RIGHT JOIN books
ON customers.book_id = books.id
WHERE customers.id IS NULL
The result should be predictable. We get only books that are not loaned out.
id | firstname | lastname | book_id | id1 | title |
---|---|---|---|---|---|
null | null | null | null | 3 | Little Women |
null | null | null | null | 4 | Tom Sawyer |
OUTER JOIN with Exclusions
The last JOIN
looks like this.
This JOIN
is not very useful, but will essentially give you a list of both customers with no loan, and books which are not loaned, at the same time.
A strange JOIN
like this may be useful in cases where you literally need to select data with no connection between tables. Perhaps you are hunting for orphan data or looking for inconsistencies in some old database you converted.
In fact, this kind of JOIN
is so weird that you can’t even do it in MySQL, it doesn’t support OUTER JOIN
. Regular SQL does, and the query would look like this (MSSQL not MySQL):
SELECT *
FROM customers
FULL OUTER JOIN books
ON customers.id = books.id
WHERE customers.id IS NULL
OR books.id IS NULL
The result of this query would return data looking something like this:
id | firstname | lastname | book_id | id1 | title |
---|---|---|---|---|---|
1 | Jeffrey | Snow | 0 | null | null |
null | null | null | null | 3 | Little Women |
null | null | null | null | 4 | Tom Sawyer |
You can achieve an OUTER JOIN
using the method Craig Buckler explained with a UNION
, but that is not without potential issues. The top Google result for simulating a FULL OUTER JOIN
in MySQL is from 2006 and can be found here. It can get somewhat complicated.
The main thing to notice in the above code is checking for NULL on both sides of the JOIN, because we want to exclude from both tables. Without checking both sides, we’d simply end up with one of the other JOINs just talked about.
Regardless of how bizarre or supported a query like this one is, I wanted to include it because it is a valid type of JOIN
, if you can think of a reason to use it.
Other Thoughts
You should always use your WHERE
clause against fields which cannot have NULL as their actual value! We always tested against the ID fields which cannot have NULL as a value. Imagine if our books table had an ISBN field that allowed NULL. If we tested for NULL using that field, it would include rows we may not want!
There is another JOIN
called a CROSS JOIN
which is also strange and unique. Imagine instead of just matching one user to one book, EVERY user were matched against EVERY book! Yes, this means if you have 20 books and 30 customers, a CROSS JOIN
would result in 30*20 rows of data! For an example of how this could be useful, check out this article.
Note that in MySQL, the JOIN
, INNER JOIN
, and CROSS JOIN
are syntactical equivalents and can replace each other. This is because JOIN
and INNER JOIN
do the same thing and must use the ON
keyword to match columns. When using a CROSS JOIN
, there is no ON
keyword, as it is matching every row in table A to every row in table B.
Conclusion
I hope these extra few JOINs made sense to you. Think about using them any time you are asking for data between tables where something “doesn’t” match the other.
- “Find all customers who have NOT ordered before”.
- “Find all customers NOT in the blacklist”.
- “Find all products that have NOT sold”.
- “Find all customers NOT borrowing books”.
- “Find all dogs that HAVEN’T been walked lately”.
- “Find employees who have NEVER sent in a support ticket”.
In summary, depending on how your database and tables are defined, you may need to use the WHERE
clause to check for NULL values to exclude the matches rather than include them as with normal JOIN behavior.
So… have you ever needed a cross join? Any other specific use cases you’d like to tell us about or want to us to cover? Let us know!