SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Member
    Join Date
    Jan 2010
    Posts
    8
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Tag table structure

    Hi,

    I want to have the tags feature in my PHP application, just like this forum ("Separate tags using a comma." when posting).

    Should I have a separate table or just a column? How it's ideally done?

    Please advise. Thanks.

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Separate table, a one-to-many relationship to the object being tagged. That allows you to efficiently query the tags, so that you can do things like create a tag cloud, a tag page, top tags list, tag suggestions, etc.

  3. #3
    SitePoint Member
    Join Date
    Jan 2010
    Posts
    8
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Hi Dan,

    You mean like this, right?

    Code:
    video
    
    -> videoid
    -> title
    -> url
    
    tag
    
    -> tagid
    -> videoid
    -> word

  4. #4
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Yes. Though you don't need a tagid, the (videoid, word) pairs are unique already.

  5. #5
    SitePoint Member
    Join Date
    Jan 2010
    Posts
    8
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, much appreciated.

  6. #6
    SitePoint Wizard Mike Borozdin's Avatar
    Join Date
    Oct 2002
    Location
    Edinburgh, UK
    Posts
    1,743
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    Separate table, a one-to-many relationship to the object being tagged. That allows you to efficiently query the tags, so that you can do things like create a tag cloud, a tag page, top tags list, tag suggestions, etc.
    Have you thought about data redundancy this schema creates, as well as updates anomalities?

    Here is an example of data stored in the suggested schema?

    Code:
    post#1 | tag#1
    post#2 | tag#2
    post#3 | tag#1
    post#1 | tag#1
    See a problem?

    I would create two tables. One will contain tags (tagId, tagName) and the other one will link posts with tags (postId, tagId) and yes the latter uses many-to-many.

  7. #7
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    The primary key (post_id, tag) must be unique, so row #1 and row #4 could not both exist. Nor does the schema create update anomalies. If the author of post #1 wants to change his tags, that doesn't mean the tag post #3 previously had in common should change. As long as tags are user defined and not a prewritten safe list maintained by someone else, your schema requires more queries and more maintenance with no practical benefit.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,334
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Mike Borozdin View Post
    Code:
    post#1 | tag#1
    post#2 | tag#2
    post#3 | tag#1
    post#1 | tag#1
    See a problem?
    yes, you've repeated post#1|tag#1 and that wouldn't be allowed



    but i think i know what you were trying to suggest, and i think it's wrong

    the issue of "redundancy" is not going to go away if you substitute a numeric id for the tag name

    as often as the tag name would have repeated, the numeric id will also repeat just as much

    "redundancy" is a poor excuse to use a numeric id

    furthermore, note that with a numeric id, an extra join is required
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Wizard Mike Borozdin's Avatar
    Join Date
    Oct 2002
    Location
    Edinburgh, UK
    Posts
    1,743
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    The primary key (post_id, tag) must be unique, so row #1 and row #4 could not both exist. Nor does the schema create update anomalies. If the author of post #1 wants to change his tags, that doesn't mean the tag post #3 previously had in common should change. As long as tags are user defined and not a prewritten safe list maintained by someone else, your schema requires more queries and more maintenance with no practical benefit.
    I made a type, of course they couldn't both exist.

    As for update anomalies, I mean if you have many posts tagged with "tag A" and suddenly you want to rename this tag to "tag AA" you have to update a whole bunch of entries.

    Moreover, if we have a schema similar to this one: tags (tagId, tagTitle, numPosts) and providing we're updating numPosts after tagging a post with a corresponding tag, we are pretty much better off when we need to generate a tag cloud, because we don't have to count the number of posts tagged with all the tags.

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,334
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Mike Borozdin View Post
    As for update anomalies, I mean if you have many posts tagged with "tag A" and suddenly you want to rename this tag to "tag AA" you have to update a whole bunch of entries.
    this is a rational argument for other many-to-many structures, but it hardly applies to tags

    'ms jones' might have to become 'mrs smith', but when does the 'mysql' tag have to change to some other name? more to the point, would it still be the mysql tag? this question goes to the root of identity, and of course identity is what candidate and primary keys are all about

    Quote Originally Posted by Mike Borozdin View Post
    Moreover, if we have a schema similar to this one: tags (tagId, tagTitle, numPosts) and providing we're updating numPosts after tagging a post with a corresponding tag, we are pretty much better off when we need to generate a tag cloud, because we don't have to count the number of posts tagged with all the tags.
    this too is a rational argument, but again, it hardly supports the need for a numeric id, it also works just as well for a tags table where the numeric id is missing

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Wizard Mike Borozdin's Avatar
    Join Date
    Oct 2002
    Location
    Edinburgh, UK
    Posts
    1,743
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    this is a rational argument for other many-to-many structures, but it hardly applies to tags

    'ms jones' might have to become 'mrs smith', but when does the 'mysql' tag have to change to some other name? more to the point, would it still be the mysql tag? this question goes to the root of identity, and of course identity is what candidate and primary keys are all about
    There can be a type in a tag that you only notice having published a few posts with it. Also, I remember making a huge tag revamp and I think I changed the names of some tags.

    Quote Originally Posted by r937 View Post
    this too is a rational argument, but again, it hardly supports the need for a numeric id, it also works just as well for a tags table where the numeric id is missing
    Okay, we can actually go without a numeric id, I just think it's better to join tables rather than perform counting when displaying a tag cloud. Although, it's just words, I didn't evaluate the performance in both cases.

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,334
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Mike Borozdin View Post
    There can be a type in a tag that you only notice having published a few posts with it. Also, I remember making a huge tag revamp and I think I changed the names of some tags.
    fixing typos and making mass changes can both be accomplished easily with simple UPDATE statements

    you do not need a numeric id to make occasional updates

    and there's the key -- you don't update tags regularly, just occasionally

    meanwhile, during normal operations, you are saved the unnecessary overhead that the extra join requires
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •