3 More Joins You Should Be Familiar With

Zack Wallace
Share

There are many ways to JOIN data from two database tables and filter the information you require. Craig Buckler wrote a popular piece on understanding JOINs; namely INNER, LEFT, RIGHT, and FULL OUTER. This article is an extension of that one.

Let’s recap these real quick. Picture two tables, one for customers and one for books to establish a book loan database.


customers table

id firstname lastname book_id
1 Joe Blow 1
2 Jane Doe 2
3 Harry Crow 2
4 Jeffrey Snow 0

books table

id Title
1 Star Wars
2 Jurassic Park
3 Little Women
4 Tom Sawyer

The books table has one row for every book.

The customers table has one row for each customer who can only have one book on loan at a time. If they have no book on loan, the book_id would be 0 or an empty string.

This is a very simple example to make the JOINs as clear as possible to understand!

A LEFT JOIN here would be in the case you want to ask a question such as “show me all customers including any books on loan.

Left Join Venn Diagram

You can see in the image that ALL data in the left circle, or table, is included in the result set. Only data that overlaps from the books table is included from the right table. This means that with a LEFT JOIN, some data in the right table may be excluded.

A RIGHT JOIN would be like asking “show me all the books in my library, along with any customers that have borrowed them.

Right Join Venn Diagram

In this image you see that ALL data in the right table is included in the result set. Only data that overlaps from the customers table is included. This means that with a RIGHT JOIN, some data in the left table may be excluded.

An OUTER JOIN would be like asking “show me all loans and all books, regardless of connections between them.

Full Outer Join Venn Diagram

In this image, you can see that ALL data from both tables will be included, regardless of whether some data overlaps. When a query like this is done, you will have non-overlapping data in the result, and those fields will be set to NULL.

An INNER JOIN would be like asking “show only customers with a loan.

Inner Join Venn Diagram

Here you can see that data may be excluded from both the left and right tables. You won’t see any customers if they DON’T have a book out, and you won’t see any books if they are NOT loaned out!

This is the most common type of data and is the default behavior when using the keyword JOIN by itself. The added word “INNER” is not usually required.

What Does “Left” And “Right” Mean Anyway?

You can think of “left” and “right” as nothing more than source order. Look at this query:

SELECT *
FROM customers
LEFT JOIN books
ON customers.book_id = books.id

Notice that I mentioned the table customers before I used the JOIN keyword. This means customers is my “left” table. Another way of thinking about it is to ask which table is left of the JOIN keyword, and which is to the right of it.

A couple caveats:

  • Simply because one table is the “left” one, does not mean you are selecting ALL its records. This is the function of LEFT JOIN or RIGHT JOIN, not source order!
  • The table that is the left one will have its columns listed first in the result set unless you specifically choose columns in the SELECT section.
  • The order of tables after the ON keyword doesn’t matter, it would return the same results if swapped.

Enough Reviewing. What Are These New JOINs?

Well, they are not “new”, they are just a way to query additional questions. How would you find answers to questions with the words “don’t” or “not” in the query?

Show me all the customers who DON’T have books on loan.

Show me all the books that are NOT loaned out.

LEFT JOIN with Exclusion

Take a look at this graphic. Notice the difference from the above JOINs in the shaded area.

Left join with exclusion Venn Diagram

This looks like a LEFT JOIN, but no longer has the “overlap” data included. Why? This is the same as asking “show me customers who have no books on loan.” Perhaps you want to select all customers without a book and send them a newsletter with a special discount?

If you are clever you might think you can just search the customers table for a book_id of 0 to do the same thing. That would work in this example but most of the time it won’t; it depends how your tables are designed.

The query looks like this:

SELECT *
FROM customers
LEFT JOIN books
ON customers.book_id = books.id
WHERE books.id IS NULL

We’ve now included a WHERE clause. Any time you have a WHERE clause, you are excluding data, or filtering it, almost like a search. So why are we searching for books.id being NULL? Why would that be NULL? Let’s run the standard LEFT JOIN and see the data it returns, we’ll have the answer:

LEFT JOIN return data

id firstname lastname book_id id1 title
1 Joe Blow 1 1 Star Wars
2 Jane Doe 2 2 Jurassic Park
3 Harry Crow 2 2 Jurassic Park
4 Jeffrey Snow 0 null null

Can you see the issue? This is a LEFT JOIN which means ALL data from the customers table is included regardless of overlap with books. Jeffrey Snow is included but he does not have a book on loan, this is why the “id1” and “title” columns are set to NULL. When the tables are JOINed, there would not be a book title or book ID linked to him.

If we ask “show me all customers including which books they have”, you would want the above data because Jeffrey is a customer regardless if he has a book on loan or not. That would be the function of a LEFT JOIN in this case.

If we ask the question “show me customers with no books on loan”, it now makes perfect sense what to look for. We only need to select customers where we see NULL for the books.id column (when JOINed, would be labeled id1 since there are two columns named “id”). We do this with a standard WHERE clause if we add WHERE books.id IS NULL. Now the result is filtered to just this:

id firstname lastname book_id id1 title
4 Jeffrey Snow 0 null null

You now have all customers who do not have books on loan.

RIGHT JOIN with Exclusion

Let’s do the same with a RIGHT JOIN. Let’s find all the books which are NOT loaned out to anybody.

A normal RIGHT JOIN would return every book regardless of whether it is loaned, the result set would look like this:

id firstname lastname book_id id1 title
1 Joe Blow 1 1 Star Wars
2 Jane Doe 2 2 Jurassic Park
3 Harry Crow 2 2 Jurassic Park
null null null null 3 Little Women
null null null null 4 Tom Sawyer

This looks a bit different. First, you might notice that Jurassic Park is listed twice. This is because two people have the book on loan, and the database is returning a row for each match.

Notice all the corresponding columns from the customers table are NULL for Little Women and Tom Sawyer because no one borrowed those titles so there is no overlapping data.

If we want to select all the books that are not loaned out, we just use the WHERE clause to find “NULL” in the customers.id column.

SELECT *
FROM customers
RIGHT JOIN books
ON customers.book_id = books.id
WHERE customers.id IS NULL

The result should be predictable. We get only books that are not loaned out.

id firstname lastname book_id id1 title
null null null null 3 Little Women
null null null null 4 Tom Sawyer

OUTER JOIN with Exclusions

The last JOIN looks like this.

This JOIN is not very useful, but will essentially give you a list of both customers with no loan, and books which are not loaned, at the same time.

A strange JOIN like this may be useful in cases where you literally need to select data with no connection between tables. Perhaps you are hunting for orphan data or looking for inconsistencies in some old database you converted.

In fact, this kind of JOIN is so weird that you can’t even do it in MySQL, it doesn’t support OUTER JOIN. Regular SQL does, and the query would look like this (MSSQL not MySQL):

SELECT *
FROM customers
FULL OUTER JOIN books
ON customers.id = books.id
WHERE customers.id IS NULL
OR books.id IS NULL

The result of this query would return data looking something like this:

id firstname lastname book_id id1 title
1 Jeffrey Snow 0 null null
null null null null 3 Little Women
null null null null 4 Tom Sawyer

You can achieve an OUTER JOIN using the method Craig Buckler explained with a UNION, but that is not without potential issues. The top Google result for simulating a FULL OUTER JOIN in MySQL is from 2006 and can be found here. It can get somewhat complicated.

The main thing to notice in the above code is checking for NULL on both sides of the JOIN, because we want to exclude from both tables. Without checking both sides, we’d simply end up with one of the other JOINs just talked about.

Regardless of how bizarre or supported a query like this one is, I wanted to include it because it is a valid type of JOIN, if you can think of a reason to use it.

Other Thoughts

You should always use your WHERE clause against fields which cannot have NULL as their actual value! We always tested against the ID fields which cannot have NULL as a value. Imagine if our books table had an ISBN field that allowed NULL. If we tested for NULL using that field, it would include rows we may not want!

There is another JOIN called a CROSS JOIN which is also strange and unique. Imagine instead of just matching one user to one book, EVERY user were matched against EVERY book! Yes, this means if you have 20 books and 30 customers, a CROSS JOIN would result in 30*20 rows of data! For an example of how this could be useful, check out this article.

Note that in MySQL, the JOIN, INNER JOIN, and CROSS JOIN are syntactical equivalents and can replace each other. This is because JOIN and INNER JOIN do the same thing and must use the ON keyword to match columns. When using a CROSS JOIN, there is no ON keyword, as it is matching every row in table A to every row in table B.

Conclusion

I hope these extra few JOINs made sense to you. Think about using them any time you are asking for data between tables where something “doesn’t” match the other.

  • “Find all customers who have NOT ordered before”.
  • “Find all customers NOT in the blacklist”.
  • “Find all products that have NOT sold”.
  • “Find all customers NOT borrowing books”.
  • “Find all dogs that HAVEN’T been walked lately”.
  • “Find employees who have NEVER sent in a support ticket”.

In summary, depending on how your database and tables are defined, you may need to use the WHERE clause to check for NULL values to exclude the matches rather than include them as with normal JOIN behavior.

So… have you ever needed a cross join? Any other specific use cases you’d like to tell us about or want to us to cover? Let us know!