Getting data from relational DB tables

Hi,

I have three MySQL DB tables as follows:

users

+---------+-----------+
| user_id | user_name |
+---------+-----------+
| 1       | User1     |
+---------+-----------+
| 2       | User2     |
+---------+-----------+

tags

+---------+-----------+
| tag_id  | tag_name  |
+---------+-----------+
| 1       | Tag1      |
+---------+-----------+
| 2       | Tag2      |
+---------+-----------+
| 3       | Tag3      |
+---------+-----------+

user_tags

+---------+-----------+--------+
| id      | user_id   | tag_id |
+---------+-----------+--------+
| 1       | 1         | 1      |
+---------+-----------+--------+
| 2       | 1         | 2      |
+---------+-----------+--------+
| 3       | 1         | 3      |
+---------+-----------+--------+
| 4       | 2         | 1      |
+---------+-----------+--------+
| 5       | 3         | 1      |
+---------+-----------+--------+
| 6       | 3         | 3      |
+---------+-----------+--------+

Each user can have multiple tags assigned to them.

I want to get/display my data as follows:

+---------+------------------+
| User    | Tags             |
+---------+------------------+
| User1   | Tag1, Tag2, Tag3 |
+---------+-----------+------+
| User2   | Tag1             |
+---------+-----------+------+
| User3   | Tag1, Tag3       |
+---------+-----------+------+

So far I have been using one query (e.g. get_users()) to get the users from the users table, then one query for each user (e.g. get_tags(user_id)) in a foreach loop to get their tags.

For 100 users, this causes 1+100 queries. Can I get the users and tags like shown in the last table using just one query? Or, what would be the optimal way to get the data like that?

Thanks for any ideas.

Partly the answer to this may depend on which database solution youā€™re using, but in general:
The fact that youā€™ve got a user_tags table should be nagging at your head that thereā€™s a better way to do it :wink:
Join the tables, Group the resultant jointable by user_id, and concatenate the group.

In MySQL, I could do this as follows:

SELECT user_name,GROUP_CONCAT(tag_name SEPARATOR ',') AS Tags FROM users LEFT JOIN user_tags ON users.user_id = user_tags.user_id JOIN tags ON user_tags.tag_id = tags.tag_id GROUP BY users.user_id;

Incidentally, you may find people suggest you not use an id column on user_tags, and instead use the natural composite key (user_id,tag_id).

1 Like

I added MySQL to my question now, forgot to mention it.

Thank you very much for the code sample. It works perfectly.

Sorry if I misunderstood, did you mean thereā€™s a better way to setting up the DB tables in a case like this?

Also, could you please elaborate how to use the natural composite key? I am using phpMyAdmin and HeidiSQL as my DB interface. I know the id column in the user_tags table is totally useless, so if I can get rid of it in such tables, that would be a nice plus.

No the table layout is correct, I was saying that your instinct about doing it in fewer queries was correct, and the fact that there is a join table (user_tags) should be the clue that is ringing in your head to say there was the possibility of doing it in a single query.

As far as keysā€¦ it gets a lot more complex than this, but I shall try and be brief:

A natural key is a/some existing field(s) in the table that uniquely identifies the row.
An artificial key is an ID field added to identify the row.

A composite key is a key that is comprised of multiple fields.

So, in user_tagā€™s case:
A user can have multiple tags, so user_id does not uniquely identify the row.
A tag can belong to multiple users, so tag_id does not uniquely identify the row.
However, a given user can only have a given tag once; so the combination of user_id and tag_id DOES uniquely identify the row. Therefore a natural composite key exists in the tableā€™s fields: the set (tuple) of fields (user_id,tag_id).

1 Like

this is excellent, and i would go one step further ā€“ i would eliminate the tags id

instead of this ā€“

CREATE TABLE tags ( tag_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT , tag_name VARCHAR(37) NOT NULL );

i would do this ā€“

CREATE TABLE tags ( tag_name VARCHAR(37) NOT NULL PRIMARY KEY );

i would then change the user_tags table to this ā€“

CREATE TABLE user_tags ( user_id INTEGER NOT NULL , CONSTRAINT user_tags_user FOREIGN KEY ( user_id ) REFERENCES users ( user_id ) , tag_name VARCHAR(37) NOT NULL , CONSTRAINT user_tags_tag FOREIGN KEY ( tag_name ) REFERENCES tags ( tag_name ) , PRIMARY KEY ( user_id , tag_name ) );

an immediate benefit of this is that you need only a 2-table join for your query, instead of a 3-table join

relational integrity is still maintained through the foreign key, if your data model requires that all tags used in the application are registered (allowed) tags as recorded in the tags table

if, however, you allow users to freely generate their own tags, you can simply do away with the tags table (and the foreign key to tag_name) and your query stays the same

3 Likes

Thank you for the detailed explanations about keys.

Thereā€™s one thing about the displayed Tags. As you can also see in your above example, the Tags are not ordered alphabetically (by tag_name). For User3, it is ā€œTag3, Tag1ā€, whereas I would want it to display as ā€œTag1, Tag3ā€.

Is there a way to order the tags by tag_name? I have ā€œORDER BY user_name ASCā€ at the end of my query but I couldnā€™t figure out how to order the tags.

ORDER BY user_name,tag_name
EDIT: Sorry, forgot this was a group by.

You can stick ORDER BYā€™s into the group_concat function.

SELECT user_name,GROUP_CONCAT(tag_name ORDER BY tag_name SEPARATOR ',') AS Tags 
FROM users LEFT JOIN user_tags ON users.user_id = user_tags.user_id JOIN tags ON user_tags.tag_id = tags.tag_id 
GROUP BY users.user_id 
ORDER BY user_name;

(ASC is the default and implied.)

1 Like

Ahhā€¦ why couldnā€™t I think of :slight_smile: Thank you.

this annoys the hell out of me, even though itā€™s such a small detail

are user names unique? if not, what does this produce?

also, why is user_id qualified and user_name isnā€™t?

sorry, canā€™t help myself ā€“ sql is sacred and i hate to see it misused, even trivially

please exercise discipline, peopleā€¦ itā€™s a good habit to get into and will save you again and again

1 Like

Iā€™m confused by this. What does it produce? A list of users in alphabetical order. Are you trying to say I should be selecting the user_id as well? Certainly possible, but not strictly relevant to the query formation.

Because user_name is valid unqualified because itā€™s a field in the resultant table (which has no name, and therefore cant be qualified), not the joining tables.

iā€™m saying if user_name isnā€™t unique, youā€™ll get more than one result row with the same name, and you wonā€™t know which is which

it ~is~ a column in one of the joined tables, except you wouldnā€™t know which one just from reading the sql

SELECT * FROM ... is also valid, but also not a good idea

My bet would be that if you run an EXPLAIN on the query the optimizer changes
ā€˜fieldnameā€™
to
ā€˜databasename.tablename.fieldnameā€™

True, this happens ā€œbehind the scenesā€ and it may not be a requirement to write qualified identifiers. But doing so does improve readability and when queries start to get complex it can be a help troubleshooting bugs to know at a glance what is what.

I confess I have never written a query using a database name qualifier. Mainly because I have never written a query involving more than one database, and having the name in the query is redundant information that makes it a bit less readable for me.

which isnt a ā€˜disciplineā€™ item, its a use case. There could eventually be 10 other fields in the user table that could uniquely identify it. Currently, user_name does. If you want to argue that user_id should be eliminated as a field the same way you could eliminate tag_id, fine.

and which table does Tags belong to? The same one that user_name belongs to, when the query resolves. the temporary result table that has no name.
If there were a conflict on what table user_name belonged to, it wouldnt let the query execute because of the ambiguity.

But sure, we could be precise, as SQL is ā€˜sacredā€™. Lets see if I can get it right this timeā€¦

SELECT `databasename`.`users`.`user_id`,`databasename`.`users`.`user_name`,GROUP_CONCAT(`databasename`.`tags`.`tag_name` ORDER BY `databasename`.`tags`.`tag_name` ASC SEPARATOR ',') AS Tags 
FROM `databasename`.`users` 
LEFT OUTER JOIN `databasename`.`user_tags` ON `databasename`.`users`.`user_id` = `databasename`.`user_tags`.`user_id` 
CROSS JOIN `databasename`.`tags` ON `databasename`.`user_tags`.`tag_id` = `databasename`.`tags`.`tag_id` 
GROUP BY `databasename`.`users`.`user_id` 
ORDER BY `databasename`.`users`.`user_name` ASC

Yup. Thatā€™s definitely clearer./s

EDIT: Nope, didnā€™t get it right. forgot the declaration of sort order.

my point remains that youā€™re only showing the user_name in that query

And youā€™re assuming a use case that relies upon unique identification. The OP wanted name and tags. I gave it.

Thank you for all your contribution.

When I ask a question, I try to keep only relevant parts in my question, so it is quicker to type, read and think on and run tests etc. I definitely get user_id as well as user_name and user_tags in my actual query, based on @m_hutleyā€™s example, because multiple users can have the same name; the unique identifier is user_id as natural.

I appreciate your comprehensive approach.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.