Creating a database to utilize hashtags

Hi all,

I was trying to do research on Twitter’s use of hashtags, as well as Flickr’s. However, all I found were support articles for people to place them in their posts, not the actual development side of how to store it.

I know the # will be used to “scrape” it out of a post and store it in a table if it hadn’t already been entered, but I was curious of that table was one BIG table - or segmented into categories such as “Comments”, “Tags” (for a profile or YouTube video even). Then I know they probably have a relational table that binds the hashtag one to whatever content was posted.

Does anyone know of a data model that coinsides with best practices? I just want to make sure I do it right.

Thanks in advance! Feel free to ask me other questions if I hadn’t really described it well.

There are database architecture efficiency standards called database normalization. I would advise you to take a read through these. Don’t think you need to follow every one strictly, but they’re good to keep back-pocket for situations like this where you’re trying to decide on a data structure.

Further Reading: http://databases.about.com/od/specificproducts/a/normalization.htm

Some of the database guys here might disagree with me, but I as long as each hashtag entry had the same fields for storage, I’d probably store them in a single table. If a comment hashtag differed in data fields than a tag hashtag, then it would make sense to create separate tables for them with the appropriate fields.

i’m having a real hard time understanding what this means

just for fun, pretend that your friendly database designer had never heard of hashtags – please describe what it is that you want to store in your database

@JeffWalden - Thank you sir! :slight_smile: I will definitely give that a look over.

Just to give an example. Let’s say someone put the hashtag - #LosAngeles for a posting of theirs, let’s say in this case a cityscape photo.

The table to store the hashtag:
hashtags

  • id (1)
  • name (LosAngeles)

The table of photos:
photos

  • id (2)
  • URL (…)
  • comment (View of Los Angeles)

The relational table of tags
hashtag_relations

  • id (3)
  • type (photo)
  • hashtag_id (1)
  • content_id (2)

Now I am not a database architect at all but just by what I know, that’s how I would lay that out. Not sure if that is a good system or not but just to give an example.

Does that help some?

i’m not sure why you even need to extract the hashtag from the comment…

… but assuming there’s a reason, then yes, what you have for the 3-table design is fine

except for id(3), that’s totally useless – the PK for that table should be all three of the other columns

further, i don’t agree that you need an id as a surrogate key for the hashtag itself – just use the hashtag

the id for the photo is fine, because using the url is problematic because of its length

but the hashtag presumably is a single string of alphanumerics without any embedded special characters (e.g. spaces, commas, etc.) so the hashtag itself functions perfectly well as its own primary key

Very good to know. I am a designer for sure. This stuff I am trying to teach myself but all of this is very insightful. Do you recommend any courses or books that delve into those concepts? Like the use of the columns to make up the Primary key, or just using the hashtag as its PK for the hashtag table.

Grab the book in r937’s signature. it is a great starting point.