I’ve developed a bookmarking application, and it’s become apparent that I need to deal with duplicate bookmarks in a sensible way.
The long-term aim is to have a way of “scoring” bookmarks, so that when someone adds a bookmark that already exists, the score for the original is incremented.
The reasons for doing this are, I no doubt, obvious.
I have a few ideas in my mind how I might build a scoring system, but I’d much prefer to run this by you guys first, before committing to something that’s either not practical or scales poorly over time.
first thing you have to do is remove everything from that table that relates to which user(s) bookmarked a particular bookmark, and put that in a separate table
That’s one of the options I’m thinking of going with — placing the URL itself into a separate table, along with the score and the ID of the originating bookmark.
I was also thinking of chaining the bookmark URLs together, so that all other bookmarks would then fall into a parent-child relationship, along the lines of:
id, parent_id, child_id, url, score
That way, when an originating parent bookmark is viewed, all of the child bookmarks can be retrieved.
Is that a viable method, or are there better ways of doing this?
That’s what I’m trying to figure out. But then there’s the issue of determining exact matching URLs, which becomes a parameter cleansing issue.
Rather than me guessing — which is exactly what I’m doing right now — I was hoping someone would actually explain the most appropriate way of doing this.
I’m using INSERT ON DUPLICATE UPDATE elsewhere in the application, so I’m at least familiar with that. In fact, I’m doing almost exactly as you’re recommending for the analytics side of things (user visits to bookmarks).
Right, I can see how this would work now.
I’m assuming I’m right to keep the parent-child method, so I can keep track of all those who added the same bookmark afterwards, yes?
No, you don’t need a parent-child method at all, the link isn’t a child of the same link, it is a duplicate of the link. You would use parent-child if you were drilling down:
Electronics -> Televisions - Sanyo -> Model123
If you want to know who added the bookmarks then you allow all inserts into the table with duplicate URLs and those who entered them. You would use a GROUP BY clause to group the URLs together and get a count of how many you had.
You would be using three tables. The one you have above (bookmarks) with the user_id column removed entirely, a user table defining information about your users and a users_bookmarks table with two columns userid, url_id and a new row for each person who enters a bookmark. You may have 50 people with different userid(s) with the same url_id in the table.
This is actually the one option I thought there might be a way of avoiding.
As I said, I’d thought through a few ideas, and this was the only one I could think of that would work fully. But I thought it was too complex and there was, perhaps, a better way.
OK, if that’s the way, so be it.
And thanks for your time, here. Always appreciated.