There are many ways to JOIN data from two database tables and filter the information you require. Craig Buckler wrote a popular piece on understanding JOINs; namely INNER, LEFT, RIGHT, and FULL OUTER. This article is an extension of that one.
Let’s recap these real quick. Picture two tables, one for
customers and one for
books to establish a book loan database.
books table has one row for every book.
customers table has one row for each customer who can only have one book on loan at a time. If they have no book on loan, the
book_id would be
0 or an empty string.
This is a very simple example to make the JOINs as clear as possible to understand!
A LEFT JOIN here would be in the case you want to ask a question such as “show me all customers including any books on loan.”
You can see in the image that ALL data in the left circle, or table, is included in the result set. Only data that overlaps from the books table is included from the right table. This means that with a
LEFT JOIN, some data in the right table may be excluded.
A RIGHT JOIN would be like asking “show me all the books in my library, along with any customers that have borrowed them.”
In this image you see that ALL data in the right table is included in the result set. Only data that overlaps from the
customers table is included. This means that with a
RIGHT JOIN, some data in the left table may be excluded.
An OUTER JOIN would be like asking “show me all loans and all books, regardless of connections between them.”
In this image, you can see that ALL data from both tables will be included, regardless of whether some data overlaps. When a query like this is done, you will have non-overlapping data in the result, and those fields will be set to NULL.
An INNER JOIN would be like asking “show only customers with a loan.”
Here you can see that data may be excluded from both the left and right tables. You won’t see any customers if they DON’T have a book out, and you won’t see any books if they are NOT loaned out!
This is the most common type of data and is the default behavior when using the keyword
JOIN by itself. The added word “INNER” is not usually required.
What Does “Left” And “Right” Mean Anyway?
You can think of “left” and “right” as nothing more than source order. Look at this query:
SELECT * FROM customers LEFT JOIN books ON customers.book_id = books.id
Notice that I mentioned the table
customers before I used the
JOIN keyword. This means
customers is my “left” table. Another way of thinking about it is to ask which table is left of the
JOIN keyword, and which is to the right of it.
A couple caveats:
- Simply because one table is the “left” one, does not mean you are selecting ALL its records. This is the function of
RIGHT JOIN, not source order!
- The table that is the left one will have its columns listed first in the result set unless you specifically choose columns in the
- The order of tables after the
ONkeyword doesn’t matter, it would return the same results if swapped.
Enough Reviewing. What Are These New JOINs?
Well, they are not “new”, they are just a way to query additional questions. How would you find answers to questions with the words “don’t” or “not” in the query?
Show me all the customers who DON’T have books on loan.
Show me all the books that are NOT loaned out.
LEFT JOIN with Exclusion
Take a look at this graphic. Notice the difference from the above JOINs in the shaded area.
This looks like a
LEFT JOIN, but no longer has the “overlap” data included. Why? This is the same as asking “show me customers who have no books on loan.” Perhaps you want to select all customers without a book and send them a newsletter with a special discount?
If you are clever you might think you can just search the
customers table for a
book_id of 0 to do the same thing. That would work in this example but most of the time it won’t; it depends how your tables are designed.
The query looks like this:
SELECT * FROM customers LEFT JOIN books ON customers.book_id = books.id WHERE books.id IS NULL
We’ve now included a
WHERE clause. Any time you have a WHERE clause, you are excluding data, or filtering it, almost like a search. So why are we searching for
books.id being NULL? Why would that be NULL? Let’s run the standard
LEFT JOIN and see the data it returns, we’ll have the answer:
LEFT JOIN return data
Can you see the issue? This is a
LEFT JOIN which means ALL data from the
customers table is included regardless of overlap with books. Jeffrey Snow is included but he does not have a book on loan, this is why the “id1” and “title” columns are set to NULL. When the tables are JOINed, there would not be a book title or book ID linked to him.
If we ask “show me all customers including which books they have”, you would want the above data because Jeffrey is a customer regardless if he has a book on loan or not. That would be the function of a
LEFT JOIN in this case.
If we ask the question “show me customers with no books on loan”, it now makes perfect sense what to look for. We only need to select customers where we see NULL for the
books.id column (when JOINed, would be labeled
id1 since there are two columns named “id”). We do this with a standard
WHERE clause if we add
WHERE books.id IS NULL. Now the result is filtered to just this:
You now have all customers who do not have books on loan.
RIGHT JOIN with Exclusion
Let’s do the same with a
RIGHT JOIN. Let’s find all the books which are NOT loaned out to anybody.
RIGHT JOIN would return every book regardless of whether it is loaned, the result set would look like this:
This looks a bit different. First, you might notice that Jurassic Park is listed twice. This is because two people have the book on loan, and the database is returning a row for each match.
Notice all the corresponding columns from the
customers table are NULL for Little Women and Tom Sawyer because no one borrowed those titles so there is no overlapping data.
If we want to select all the books that are not loaned out, we just use the
WHERE clause to find “NULL” in the
SELECT * FROM customers RIGHT JOIN books ON customers.book_id = books.id WHERE customers.id IS NULL
The result should be predictable. We get only books that are not loaned out.
OUTER JOIN with Exclusions
JOIN looks like this.
JOIN is not very useful, but will essentially give you a list of both customers with no loan, and books which are not loaned, at the same time.
JOIN like this may be useful in cases where you literally need to select data with no connection between tables. Perhaps you are hunting for orphan data or looking for inconsistencies in some old database you converted.
In fact, this kind of
JOIN is so weird that you can’t even do it in MySQL, it doesn’t support
OUTER JOIN. Regular SQL does, and the query would look like this (MSSQL not MySQL):
SELECT * FROM customers FULL OUTER JOIN books ON customers.id = books.id WHERE customers.id IS NULL OR books.id IS NULL
The result of this query would return data looking something like this:
You can achieve an
OUTER JOIN using the method Craig Buckler explained with a
UNION, but that is not without potential issues. The top Google result for simulating a
FULL OUTER JOIN in MySQL is from 2006 and can be found here. It can get somewhat complicated.
The main thing to notice in the above code is checking for NULL on both sides of the JOIN, because we want to exclude from both tables. Without checking both sides, we’d simply end up with one of the other JOINs just talked about.
Regardless of how bizarre or supported a query like this one is, I wanted to include it because it is a valid type of
JOIN, if you can think of a reason to use it.
You should always use your
WHERE clause against fields which cannot have NULL as their actual value! We always tested against the ID fields which cannot have NULL as a value. Imagine if our books table had an ISBN field that allowed NULL. If we tested for NULL using that field, it would include rows we may not want!
There is another
JOIN called a
CROSS JOIN which is also strange and unique. Imagine instead of just matching one user to one book, EVERY user were matched against EVERY book! Yes, this means if you have 20 books and 30 customers, a
CROSS JOIN would result in 30*20 rows of data! For an example of how this could be useful, check out this article.
Note that in MySQL, the
INNER JOIN, and
CROSS JOIN are syntactical equivalents and can replace each other. This is because
INNER JOIN do the same thing and must use the
ON keyword to match columns. When using a
CROSS JOIN, there is no
ON keyword, as it is matching every row in table A to every row in table B.
I hope these extra few JOINs made sense to you. Think about using them any time you are asking for data between tables where something “doesn’t” match the other.
- “Find all customers who have NOT ordered before”.
- “Find all customers NOT in the blacklist”.
- “Find all products that have NOT sold”.
- “Find all customers NOT borrowing books”.
- “Find all dogs that HAVEN’T been walked lately”.
- “Find employees who have NEVER sent in a support ticket”.
In summary, depending on how your database and tables are defined, you may need to use the
WHERE clause to check for NULL values to exclude the matches rather than include them as with normal JOIN behavior.
So… have you ever needed a cross join? Any other specific use cases you’d like to tell us about or want to us to cover? Let us know!
Zack Wallace is a hobbyist programmer and private web developer in Northern Arizona and has been in the IT field for over 12 years.