SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Wizard
    Join Date
    Mar 2002
    Location
    Bristol, UK
    Posts
    2,240
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    URL as primary key?

    What are the performance implications of the primary key field of a MySQL table being a web page URL? Obviously each one will be unique, but may be fairly lengthy. Should I avoid this or will it not be an issue?

    I would estimate that at any one time, the table itself would contain roughly 500,000 records.
    Sam Hastings

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    what tables will be related to this table? describe the relationships and their cardinality (one-to-one? one-to-many? many-to-many?)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Wizard
    Join Date
    Mar 2002
    Location
    Bristol, UK
    Posts
    2,240
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The table in question is called item.

    There will be a category table, and each item will be assigned one category (one category - many items).

    There is also a favourites table, and each favourite will be assigned one row from this table (one item - many favourites).
    Sam Hastings

  4. #4
    SitePoint Wizard
    Join Date
    Mar 2002
    Location
    Bristol, UK
    Posts
    2,240
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    On second thoughts, maybe I should paint the bigger picture here.

    Table item keeps a record of items harvested from various RSS feeds. There is a PHP script which checks these feeds every 5 minutes, and if the last-modified HTTP header is later than the one I have on my records, it grabs all the RSS items and adds them into the item table. One RSS item = one database row.

    The problem is duplicate items. The script grabs each item on the 5 minute mark if the feed has been updated, but it will grab all items from the feed and not just the new ones.

    Yesterday I discovered this handy little piece of SQL called "ON DUPLICATE KEY UPDATE". This would eliminate duplicate records, and would also allow each item to be the most recent version of itself.

    However the only thing that would be duplicated was the URL, meaning that would have to be the sole primary key for the table, no? I haven't had any luck getting it to work with a numeric id and a URL as the primary key.

    Edit: I've looked into using the REPLACE function but I understand that this function deletes the old row and inserts a new one, which would destroy any dependencies from other tables.
    Sam Hastings

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    the favourites table will have to refer to the items, so i would use an auto_increment primary key on the items, so that the favourites don't have to repeat the entire url

    then in the items table, simply declare a unique index on the url itself -- this will be used when you do the INSERT with ON DUPLICATE KEY
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Wizard
    Join Date
    Mar 2002
    Location
    Bristol, UK
    Posts
    2,240
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ah, I didn't realise I could set up different indexes like that. Many thanks!
    Sam Hastings


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •