SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Zealot
    Join Date
    Feb 2008
    Posts
    109
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to design a position tracking DB?

    Hi all,

    Whats the best practice for designing a system to monitor and track changes in an ordered list?

    That probably sounds a little obscure, but I'm thinking of the scenario where you monitor over time changes in league rank positions for example - i.e. storing top 100 products over time. For now its best to assume this is a seperate DB entity, storing only this 'league'.

    It obviously makes no sense to store the entire league over and over and then compare across different dates, thus I assume you just want to store changes as they occur? New entrants, change in position/rank, changes in composition maybe (new title?).

    I've attempted searching but not found anything, any suggestions on how to describe this problem?

    I'm a little out of my depth with this. Any advice would be much appreciated.

    Cheers,

    DM

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    don't store the entire league, just the top 100 products, along with their rank, and a date of course
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot
    Join Date
    Feb 2008
    Posts
    109
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for getting back to me.

    So something like;

    - Items Table - lists unique items
    - Rank Table - stores position and date against Item foreign key

    By implication this would mean that for every date the ranks were recorded a reference to however many items that exist would be added to the Rank Table? I assumed this would be inefficient and therefore thought there must be a better model.

    The example I mentioned above was just an example, in reality I'd be tracking the position of thousands of items, some of which might never change rank. I kind of envisaged a system where only changes were recorded, but maybe this isnt possible?

    Cheers,

    DM

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    of course it is possible to record only the changes, but it's more complicated

    so you're tracking the position of all items? or just the top 1000?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Zealot
    Join Date
    Feb 2008
    Posts
    109
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    so you're tracking the position of all items? or just the top 1000?
    I dont actually have a database of items specifically, I'm planning on mashing up a few webservices based around alexa rankings - http://programmableweb.com/api/alexa-top-sites.

    I've yet to really decide how many results to track, as many as is feasible. really. Some of the data returned by the webservice will change from week to week, others will be more constant - I'm not sure whether it makes sense to setup a system to log just the changes (in both content and rank position).

    DM

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    if you're obtaining data from external sources, the best approach is to store as much detail as you can, even stuff that is available but you're not sure if you can use it or not yet

    after that, running queries to track stuff can take any number of directions

    but if you pre-calculate stats from incoming sources, and store only that, then you can never get the missing stuff out of it

    another consideration is complexity

    take for example your items table -- what if an incoming source contains a new item you don't have yet? you have to have logic to add it, right? that can get in the way of efficient data capture or storage

    the concepts here are sometimes called ETL -- extract, transform, load

    in your case, extract means to get the data you want from the web, transform is where you decide which items to track, and load (which normally refers to loading clean data into a well designed database) would be your queries to display the results of your tracking
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Zealot
    Join Date
    Feb 2008
    Posts
    109
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    the concepts here are sometimes called ETL -- extract, transform, load
    Just did a quick search on this, pulled up some really interesting articles thanks.

    Quote Originally Posted by r937 View Post
    in your case, extract means to get the data you want from the web, transform is where you decide which items to track, and load (which normally refers to loading clean data into a well designed database) would be your queries to display the results of your tracking
    Does this suggest that a full 'data load' should occur everytime I query the web service for fresh data? Then use the date of the insert to compare any changes that have occured when tracking over time? Would this approach cause performance issues? For example in this case a URL would represent a unique item, with different meta data associated, selecting all items with the same URL, then comparing the meta data over each strikes me as intensive.

    Really appreciate your thoughts on this btw.

    DM

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    query the web for fresh data whenever you like, and dump (append) the raw data into your "incoming" tables, using date extracted as one column (same value for all rows appended)

    calculate changes and track position separately, asynchronously, unrelated to data capture, on perhaps a different schedule

    making sense of matching URLs is the middle step, the "transform" -- it should not be part of the data capture process
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •