Questions on JOINs and foreign keys

I’m new to using JOINs and foreign keys in MySQL. I think I understand but have a few questions:

  1. Are less queries always better? E.g. You can query a product and then query the category it belongs to in two queries or you can use one in a single JOIN query? Is the latter better? I’m guessing it means less trips to the server if you’re using PHP.
  2. Should you always index both columns involved in a foreign key relationship? What if the cardinality on a column is really low?
  3. Are joins faster with a foreign key in place? Or it is for data integrity only?
  4. Is adding ‘On INSERT CASCADE’ really worth it on an auto-increment ID field? In practice they never really change do they? So is having an index here a waste of resources?
  5. If you have ‘ON DELETE RESTRICT’ set up the query fails if you try to delete the parent of a child. Other than PHP returning false when you run a DELETE query, is there any other way to check if there’s a dependency? From your web application point-of-view a query returning false is ambiguous. Or is there a specific error code for it?
  6. Am I right in saying most FK relations are one-to-many. When you set the key up, do you always do it from the ‘one’ to the ‘many’?
  7. If you want a zero-to-many do you allow NULL on the column on the left table—and if you want a one-to-one you make both columns unique?

Thanks.

in the example you cited, yes, and in general, yes

there are exceptions, but as a beginner, you won’t run across many

you might not have to

i believe mysql now automatically creates foreign key indexes for you

technically speaking, it’s the indexes that make joins faster, not the logical restriction of values

yes, foreign keys are for data integrity (the logical restriction of values)

i’m not gonna reply to this one because it mixes too many concepts to untangle with a single straight answer

i think so, yes

testing it outside of php will surely reveal what error number it is

not most – all

nope, the other way around

no, on the many side

this question is beyond the scope of a beginner’s concern

short answer: you may need application code to enforce this one

I’ll jump in with my opinions for the few questions that Rudy left unanswered:

I don’t think ON INSERT CASCADE is available in MySQL (unless emulated by triggers) - or are you talking about ON UPDATE CASCADE? If you have a relation then you always need an index no matter if it’s CASCADE, RESTRICT or SET NULL. CASCADE will not use up resources unless you perform the update. Personally, I tend to use ON UPDATE CASCADE even on relationships with auto-increment fields - yes, they normally don’t change but they may change in some unusual circumstances like me doing some db maintenance, data cleanup or data corrections - then I prefer the db to allow me to change the PK instead of me going throught the hoops to change all the keys that are bound by the ON UPDATE RESTRICT relations. Of course, this will vary depending on the situation but if in doubt I simply allow CASCADE even if I’m not intending to use it. I think about it in another way: do I need the db to forbid changing the PK? If not, then I use CASCADE.

Yes, both unique or primary keys. You can do this but must remember that ON UPDATE/DELETE actions will be one-way only. To make them work in both directions you would need to employ some other mechanisms - I suspect you could do it with triggers.

Thanks to you both and sorry I made some mistakes:

  1. I got the many to one the wrong way round.
  2. Yes, I mean ON UPDATE CASCADE.

Let’s say you create a FK index like in SitePoint’s very own tutorial:

ALTER TABLE `user`
ADD CONSTRAINT `FK_course`
FOREIGN KEY (`course`) REFERENCES `course` (`id`)
ON UPDATE CASCADE;

In this example do you refer to user as the child and course as the parent? With regards to this:

i believe mysql now automatically creates foreign key indexes for you

Usually the parent table’s column will be a primary index anyway. Would it index the user.course automatically? I’ve been doing it manually.

yes

yes it would

do a SHOW CREATE TABLE and you might see two indexes on the same column

Thanks for that.

I can confirm if there is no index it does indeed create one but if there is already one there it just uses that.

Quick further question on joins.

Am I right in saying in this situation you wouldn’t use joins?

You have a products table (e.g. item Apples) and a products unit table (e.g. items Granny Smith, Pink Lady, Golden Delicious). The products unit table has a belongsto column that is the parent of products id. Normally I would get the current product details from the database (1 query) and then SELECT name, price FROM productunits WHERE belongsto = productid ORDER BY name ASC (1 query).

That’s better with two queries, right? You could join them but to me it seems neater to do it in two queries rather than fetching the main product ID each time (which would be the same in each row).

no, you are not, sorry – i definitely would use a join in a single query rather than two separate queries

I wouldn’t use joins :). The product table may have many other fields apart from ID like name, descriptions, etc. and I wouldn’t want to fetch the same data with each item of the unit table. I don’t know why Rudy would use a join since it also seems counter-intuitive to me - suppose I get 7 rows from such a join query and want to simply get data from the product table - then I get this data from the first returned row (or second, or third, etc.) and discard the rest because it is duplicated in every row - this doesn’t look elegant, does it?

Thanks. Well, that was my thinking. In my mind it just seems odd to get the product name and description with each row.

How does MySQL work here? If I query the productunits table and join the products and it gets exactly the same name and description each time does MySQL query products per row or just once?

lemon juice, you’re trolling me, aren’t you

:smiley:

my dear doctor quincy, let’s not use made up tables, let’s use your actual scenario

please show your two tables, and the two queries you would run

I just couldn’t help myself… :D.

I thought I could at least provoke you to offer a short explanation of why you would use a join or why two queries is a bad idea :shifty:

overall performance

short enough?

:slight_smile:

Short enough but not convincing enough :). I don’t think this kind of SELECT with JOIN will be faster (by any significant margin) than two plain SELECTs given the fact that the join will fetch n-times more data unnecessarily because of duplication of the same product fields in each row. At best I’d consider both methods more or less equal in performance.

I’m also thinking about the issue of fetching the data by a scripting language, which is common on the web. Getting the results from a simple select is easy and logical while trying to extract the product item from the unit items being combined in every single row is slightly convoluted. Not to mention the mess if someone uses a database abstraction layer where each row goes to a separate object of given type…

only if you are a very junior programmer

tail wagging the dog, dude

feel free to run comprehensive benchmarks

as a general rule of thumb, my advice is rock solid – a join query is always* better than two separate queries

Just out at the moment so will reply in more detail later but that links seems to suggest the problem is with number of queries rather than joins per se.

I wonder whether the difference is negligible and the gain of more logical results is worth it. (To use union the column numbers need to match so not always practical.)

If extra queries are so bad what about mysqli multi query? Only one server trip…

Useful discussion here for me either way, thanks.

Taking application programming out of the mix yes. However, once application logic is introduced there are several areas where performance and maintainability can both be gained using multiple queries. With that in mind how rarely as a software engineer would one write a program with only SQL. It would be naive to consider only the performance of a query when building any application. I know this is not a discussion about applications but it is difficult to talk about performance without considering the entire gamut of technology used to deliver an end solution/service.

Just as with your example when application logic is introduced that leaves room for caching and other application side techniques. Techniques that can significantly reduce load and increase performance over a large span of requests. Perhaps the initial request to fetch data from the service layer might be a little slower when using multiple queries but requests following are much faster due to caching with the added benefit of producing a much more stable, maintainable, and flexible infrastructure overall.

Though I would tend to agree that as a beginner when writing plain jane queries try to do things a single query. For example, don’t select from one table than loop through the result and for each row run a query to select data. In just about all cases when that is done a single query can be created instead. In more cases than not unless you are using or building your own ORM w/ caching the fewer queries the better.