So far I have been using one query (e.g. get_users()) to get the users from the users table, then one query for each user (e.g. get_tags(user_id)) in a foreach loop to get their tags.
For 100 users, this causes 1+100 queries. Can I get the users and tags like shown in the last table using just one query? Or, what would be the optimal way to get the data like that?
Partly the answer to this may depend on which database solution youāre using, but in general:
The fact that youāve got a user_tags table should be nagging at your head that thereās a better way to do it
Join the tables, Group the resultant jointable by user_id, and concatenate the group.
In MySQL, I could do this as follows:
SELECT user_name,GROUP_CONCAT(tag_name SEPARATOR ',') AS Tags FROM users LEFT JOIN user_tags ON users.user_id = user_tags.user_id JOIN tags ON user_tags.tag_id = tags.tag_id GROUP BY users.user_id;
Incidentally, you may find people suggest you not use an id column on user_tags, and instead use the natural composite key (user_id,tag_id).
I added MySQL to my question now, forgot to mention it.
Thank you very much for the code sample. It works perfectly.
Sorry if I misunderstood, did you mean thereās a better way to setting up the DB tables in a case like this?
Also, could you please elaborate how to use the natural composite key? I am using phpMyAdmin and HeidiSQL as my DB interface. I know the id column in the user_tags table is totally useless, so if I can get rid of it in such tables, that would be a nice plus.
No the table layout is correct, I was saying that your instinct about doing it in fewer queries was correct, and the fact that there is a join table (user_tags) should be the clue that is ringing in your head to say there was the possibility of doing it in a single query.
As far as keysā¦ it gets a lot more complex than this, but I shall try and be brief:
A natural key is a/some existing field(s) in the table that uniquely identifies the row.
An artificial key is an ID field added to identify the row.
A composite key is a key that is comprised of multiple fields.
So, in user_tagās case:
A user can have multiple tags, so user_id does not uniquely identify the row.
A tag can belong to multiple users, so tag_id does not uniquely identify the row.
However, a given user can only have a given tag once; so the combination of user_id and tag_id DOES uniquely identify the row. Therefore a natural composite key exists in the tableās fields: the set (tuple) of fields (user_id,tag_id).
an immediate benefit of this is that you need only a 2-table join for your query, instead of a 3-table join
relational integrity is still maintained through the foreign key, if your data model requires that all tags used in the application are registered (allowed) tags as recorded in the tags table
if, however, you allow users to freely generate their own tags, you can simply do away with the tags table (and the foreign key to tag_name) and your query stays the same
Thank you for the detailed explanations about keys.
Thereās one thing about the displayed Tags. As you can also see in your above example, the Tags are not ordered alphabetically (by tag_name). For User3, it is āTag3, Tag1ā, whereas I would want it to display as āTag1, Tag3ā.
Is there a way to order the tags by tag_name? I have āORDER BY user_name ASCā at the end of my query but I couldnāt figure out how to order the tags.
ORDER BY user_name,tag_name
EDIT: Sorry, forgot this was a group by.
You can stick ORDER BYās into the group_concat function.
SELECT user_name,GROUP_CONCAT(tag_name ORDER BY tag_name SEPARATOR ',') AS Tags
FROM users LEFT JOIN user_tags ON users.user_id = user_tags.user_id JOIN tags ON user_tags.tag_id = tags.tag_id
GROUP BY users.user_id
ORDER BY user_name;
Iām confused by this. What does it produce? A list of users in alphabetical order. Are you trying to say I should be selecting the user_id as well? Certainly possible, but not strictly relevant to the query formation.
Because user_name is valid unqualified because itās a field in the resultant table (which has no name, and therefore cant be qualified), not the joining tables.
My bet would be that if you run an EXPLAIN on the query the optimizer changes
āfieldnameā
to
ādatabasename.tablename.fieldnameā
True, this happens ābehind the scenesā and it may not be a requirement to write qualified identifiers. But doing so does improve readability and when queries start to get complex it can be a help troubleshooting bugs to know at a glance what is what.
I confess I have never written a query using a database name qualifier. Mainly because I have never written a query involving more than one database, and having the name in the query is redundant information that makes it a bit less readable for me.
which isnt a ādisciplineā item, its a use case. There could eventually be 10 other fields in the user table that could uniquely identify it. Currently, user_name does. If you want to argue that user_id should be eliminated as a field the same way you could eliminate tag_id, fine.
and which table does Tags belong to? The same one that user_name belongs to, when the query resolves. the temporary result table that has no name.
If there were a conflict on what table user_name belonged to, it wouldnt let the query execute because of the ambiguity.
But sure, we could be precise, as SQL is āsacredā. Lets see if I can get it right this timeā¦
SELECT `databasename`.`users`.`user_id`,`databasename`.`users`.`user_name`,GROUP_CONCAT(`databasename`.`tags`.`tag_name` ORDER BY `databasename`.`tags`.`tag_name` ASC SEPARATOR ',') AS Tags
FROM `databasename`.`users`
LEFT OUTER JOIN `databasename`.`user_tags` ON `databasename`.`users`.`user_id` = `databasename`.`user_tags`.`user_id`
CROSS JOIN `databasename`.`tags` ON `databasename`.`user_tags`.`tag_id` = `databasename`.`tags`.`tag_id`
GROUP BY `databasename`.`users`.`user_id`
ORDER BY `databasename`.`users`.`user_name` ASC
Yup. Thatās definitely clearer./s
EDIT: Nope, didnāt get it right. forgot the declaration of sort order.
When I ask a question, I try to keep only relevant parts in my question, so it is quicker to type, read and think on and run tests etc. I definitely get user_id as well as user_name and user_tags in my actual query, based on @m_hutleyās example, because multiple users can have the same name; the unique identifier is user_id as natural.