SitePoint Sponsor

User Tag List

Results 1 to 16 of 16

Thread: Related Tags

  1. #1
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Related Tags

    I have a site where I have Items and they can have and belong to many Tags.

    I have the following tables:

    Items
    id title

    Tags
    id name

    items_tags
    item_id
    tag_id

    When an item is posted a string of tags is given and they are broken down and inserted into the database one at a time.

    What I'd like to do is find tags that are related to a certain tag. I dont know how to best define related but I would guess it would be best to count how many times tag x was also specified with the currently selected tag and then sort by the count of each other day.

    I just don't know how to do this in SQL.

    Thanks.

  2. #2
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    show some sample data and sample output so we have a clearer idea of what you are trying to do.

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    i'm not in a position to test this on my own first, so would you please give this a try

    it is a query for the 20 most popular tag combinations:
    Code:
    select t1.name as tag1_name
         , t2.name as tag2_name
         , count(*) as howmany
      from item_tags as one
    inner
      join item_tags as two
        on one.item_id = two.item_id
       and one.tag_id  > two.tag_id  
    inner
      join tags as t1
        on t1.id = one.tag_id   
    inner
      join tags as t2
        on t2.id = two.tag_id   
    group
        by t1.name
         , t2.name
    order 
        by howmany desc limit 20
    please let me know if this is useful (or even works)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The inspiration of this comes from a page such as http://del.icio.us/popular/blog where if you look on the right is shows related tags.

    Thanks.

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Pests
    Thanks.
    you're welcome

    did you try my query?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Trying it as I write this. Let me set up a few more relationships and modify the query a bit to see how its working.

  7. #7
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That query alone works pretty good. How would I modify it to only return tags related to a certain tag? Is it possible?

    The only problem I see with that is going backwards. Lets say you have two tags named ouch and pain. If pain is related to ouch then ouch should also be related to pain but there seems to only be a one way relation in the results returned. Heres what I'm getting on a set of example tags I made up for the test:

    Code:
    tag1_name tag2_name howmany
    pain 	  ouch 	    3
    useful 	  php 	    3  
    car 	  pain 	    1
    magic 	  ouch      1
    magic 	  pain      1
    useful    car 	    1
    car 	  ouch 	    1
    What I was planning on doing was a where selecting the tag I wanted from tag1_name. If you notice though pain would be related to ouch (the first row) but then when searching for ouch its not related to anything.

    Sorry for all the trouble and thank you.

  8. #8
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I might have figured it out on my own.

    I added: WHERE t1.name = 'pain' OR t2.name = 'pain' to get:

    Code:
    tag1_name tag2_name howmany
    pain      ouch      3
    magic 	  pain      1
    car 	  pain 	    1
    and: WHERE t1.name = 'ouch' OR t2.name = 'ouch' to get:

    Code:
    tag1_name tag2_name howmany
    pain      ouch      3
    magic     ouch      1
    car       ouch      1
    They both work as expected but I'll have to do some checking to see which row is the related tag (which I'm thinking will just be simple if tag1_name = tag I'm looking for, use tag2 and vise virsa) unless theres a way to narrow this down in sql even further to something like two columns: related and howmany. Or something similar.

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    if you would quickly peruse my first query, it includes a self-join

    each tag is matched to every other tag for each item -- this establishes the "related tags" relationship, as both are present for the same item

    now, matching ouch with all related tags is going to find ouch and pain

    but matching pain with all related tags is going to find pain and ouch

    in order that we do not count these stats twice, an arbitrary choice is made to collect stats only for the pairs where the first tag collates before the second

    there is actually no "direction" involved, they are co-related

    so my "top 20" query is supposed to just show the most popular combinations, and it merely presents the pairs of tags always with the lower collating tag before the higher

    once you understand this, does the information make more sense?

    okay, now that we understand the information, here is your query to find all the tags related to a given tag --
    Code:
    select t2.name as tag2_name
         , count(*) as howmany
      from item_tags as one
    inner
      join item_tags as two
        on one.item_id = two.item_id
    inner
      join tags as t1
        on t1.id = one.tag_id   
    inner
      join tags as t2
        on t2.id = two.tag_id   
     where t1.name = 'ouch' 
    group
        by t2.name
    order 
        by howmany desc limit 20
    notice how there's no "greater than" condition here
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the help and for putting up with the simple questions. Theres only one slight problem I can notice: the query also returns the tag your searching for. It would be simple to remove this in code but I dislike having to do special checks like this in code when im just asking for related tags. I hope its not an issue and thanks for your patience.

    I honestly didn't understand most of the first query.. the greater than line especially had me confused.

    Thanks for explaining it to me and I think I just learned a bit more about SQL.

    Quick question though. When there are alot of items or tags will this query slow down much? Should I have to worry about something like that at this time?

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    no, don't worry about the performance, you have a WHERE clause to initiate the query, so if there's an index on that column, you should scale easily to fantastic volumes
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Pests
    Theres only one slight problem I can notice: the query also returns the tag your searching for.
    ah, yes, sorry
    Code:
    select t2.name as tag2_name
         , count(*) as howmany
      from item_tags as one
    inner
      join item_tags as two
        on one.item_id = two.item_id
       and one.tag_id  <> two.tag_id
    inner
      join tags as t1
        on t1.id = one.tag_id   
    inner
      join tags as t2
        on t2.id = two.tag_id   
     where t1.name = 'ouch' 
    group
        by t2.name
    order 
        by howmany desc limit 20
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, works like a charm.

  14. #14
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have a few more questions on SQL relating to to this topic but not exactly the same (ie: Each item has a user_id column, id like to get a list of all tags that a user has submitted without duplicates). Should I repost a new topic?

    Sorry for the trouble!

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Pests
    Should I repost a new topic?
    no, go ahead and ask your question in this thread
    Code:
    select distinct Tags.name
      from Items
    inner
      join item_tags
        on item_tags.item_id = Items.id
    inner
      join Tags
        on Tags.id = item_tags.tag_id
     where Items.user_id = 937
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Enthusiast
    Join Date
    Apr 2004
    Location
    Michigan
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There seems to be a problem with that one. Not sure what it is but some tags are missing. One that I noticed missing only appears in two items.. by two different users.

    Edit: Nevermind, it works as expected. One of my test items didn't have a user id related to it.

    Thanks.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •