SitePoint Sponsor |
|
User Tag List
Results 1 to 16 of 16
Thread: Related Tags
-
Jan 28, 2006, 11:31 #1
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Related Tags
I have a site where I have Items and they can have and belong to many Tags.
I have the following tables:
Items
id title
Tags
id name
items_tags
item_id
tag_id
When an item is posted a string of tags is given and they are broken down and inserted into the database one at a time.
What I'd like to do is find tags that are related to a certain tag. I dont know how to best define related but I would guess it would be best to count how many times tag x was also specified with the currently selected tag and then sort by the count of each other day.
I just don't know how to do this in SQL.
Thanks.
-
Jan 28, 2006, 11:33 #2
-
Jan 28, 2006, 11:37 #3
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
i'm not in a position to test this on my own first, so would you please give this a try
it is a query for the 20 most popular tag combinations:Code:select t1.name as tag1_name , t2.name as tag2_name , count(*) as howmany from item_tags as one inner join item_tags as two on one.item_id = two.item_id and one.tag_id > two.tag_id inner join tags as t1 on t1.id = one.tag_id inner join tags as t2 on t2.id = two.tag_id group by t1.name , t2.name order by howmany desc limit 20
-
Jan 28, 2006, 12:11 #4
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
The inspiration of this comes from a page such as http://del.icio.us/popular/blog where if you look on the right is shows related tags.
Thanks.
-
Jan 28, 2006, 12:28 #5
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
Originally Posted by Pests
did you try my query?
-
Jan 28, 2006, 12:55 #6
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Trying it as I write this. Let me set up a few more relationships and modify the query a bit to see how its working.
-
Jan 28, 2006, 12:58 #7
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
That query alone works pretty good. How would I modify it to only return tags related to a certain tag? Is it possible?
The only problem I see with that is going backwards. Lets say you have two tags named ouch and pain. If pain is related to ouch then ouch should also be related to pain but there seems to only be a one way relation in the results returned. Heres what I'm getting on a set of example tags I made up for the test:
Code:tag1_name tag2_name howmany pain ouch 3 useful php 3 car pain 1 magic ouch 1 magic pain 1 useful car 1 car ouch 1
Sorry for all the trouble and thank you.
-
Jan 28, 2006, 13:25 #8
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I might have figured it out on my own.
I added: WHERE t1.name = 'pain' OR t2.name = 'pain' to get:
Code:tag1_name tag2_name howmany pain ouch 3 magic pain 1 car pain 1
Code:tag1_name tag2_name howmany pain ouch 3 magic ouch 1 car ouch 1
-
Jan 28, 2006, 14:06 #9
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
if you would quickly peruse my first query, it includes a self-join
each tag is matched to every other tag for each item -- this establishes the "related tags" relationship, as both are present for the same item
now, matching ouch with all related tags is going to find ouch and pain
but matching pain with all related tags is going to find pain and ouch
in order that we do not count these stats twice, an arbitrary choice is made to collect stats only for the pairs where the first tag collates before the second
there is actually no "direction" involved, they are co-related
so my "top 20" query is supposed to just show the most popular combinations, and it merely presents the pairs of tags always with the lower collating tag before the higher
once you understand this, does the information make more sense?
okay, now that we understand the information, here is your query to find all the tags related to a given tag --Code:select t2.name as tag2_name , count(*) as howmany from item_tags as one inner join item_tags as two on one.item_id = two.item_id inner join tags as t1 on t1.id = one.tag_id inner join tags as t2 on t2.id = two.tag_id where t1.name = 'ouch' group by t2.name order by howmany desc limit 20
-
Jan 28, 2006, 14:46 #10
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thanks for the help and for putting up with the simple questions. Theres only one slight problem I can notice: the query also returns the tag your searching for. It would be simple to remove this in code but I dislike having to do special checks like this in code when im just asking for related tags. I hope its not an issue and thanks for your patience.
I honestly didn't understand most of the first query.. the greater than line especially had me confused.
Thanks for explaining it to me and I think I just learned a bit more about SQL.
Quick question though. When there are alot of items or tags will this query slow down much? Should I have to worry about something like that at this time?
-
Jan 28, 2006, 15:43 #11
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
no, don't worry about the performance, you have a WHERE clause to initiate the query, so if there's an index on that column, you should scale easily to fantastic volumes
-
Jan 28, 2006, 15:45 #12
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
Originally Posted by Pests
Code:select t2.name as tag2_name , count(*) as howmany from item_tags as one inner join item_tags as two on one.item_id = two.item_id and one.tag_id <> two.tag_id inner join tags as t1 on t1.id = one.tag_id inner join tags as t2 on t2.id = two.tag_id where t1.name = 'ouch' group by t2.name order by howmany desc limit 20
-
Jan 28, 2006, 16:19 #13
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thanks, works like a charm.
-
Jan 28, 2006, 17:17 #14
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I have a few more questions on SQL relating to to this topic but not exactly the same (ie: Each item has a user_id column, id like to get a list of all tags that a user has submitted without duplicates). Should I repost a new topic?
Sorry for the trouble!
-
Jan 28, 2006, 18:55 #15
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
Originally Posted by Pests
Code:select distinct Tags.name from Items inner join item_tags on item_tags.item_id = Items.id inner join Tags on Tags.id = item_tags.tag_id where Items.user_id = 937
-
Jan 28, 2006, 19:04 #16
- Join Date
- Apr 2004
- Location
- Michigan
- Posts
- 96
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
There seems to be a problem with that one. Not sure what it is but some tags are missing. One that I noticed missing only appears in two items.. by two different users.
Edit: Nevermind, it works as expected. One of my test items didn't have a user id related to it.
Thanks.
Bookmarks