SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)

    Insert If Not Exists Equivalent and InnoDB autoincrement

    I want to insert a value into a table if it doesn't exist, and get the id (which is autoincremented) of the inserted row or the id of the existing row.

    At the moment I am doing this like so:
    Code MySQL:
    Then I can just get the last insert id to get the id of the existing or new row ('col' has a UNIQUE constraint on it).

    But the table in question is using the InnoDB storage engine, which means that it allocates the autoincrement value to the row before the INSERT operation, so if the INSERT fails, the autoincrement value has still increased. e.g. with an empty table
    Code MySQL:
    would return the id of 1.
    If I try and insert 'value1' again 500 times, each time I would get back the id 1.
    Now if I try and insert a new value
    Code MySQL:
    I will get back the id of 502, since InnoDB increased the auto increment value for those 500 insert operations, even though they didn't insert anything.

    So I am worried about my autoincrement value getting really high really quickly.

    There are a few solutions I can think of:

    • Switch to MyISAM and use triggers instead of Foreign Key Constraints (I don't need row level locking in this instance)
    • Check whether a record exists before trying to insert it (and only insert records that don't exist)
    • Periodically run a script that 'cleans up' the auto increment values (I can't do this as I may want to use the id values in URLs)
    • Don't worry about it and keep everything as it is


    Can anyone advise me on this issue?

    Thanks

    Dave

  2. #2
    SitePoint Wizard
    Join Date
    Mar 2002
    Location
    Bristol, UK
    Posts
    2,240
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I can't help but think maybe you're overcomplicating things a little bit here. Here's my take on how to go about doing what you want to do...

    Quote Originally Posted by djeyewater View Post
    I want to insert a value into a table if it doesn't exist...
    INSERT IGNORE INTO table_name (col) VALUES ('value');

    ...and get the id (which is autoincremented) of the inserted row or the id of the existing row.
    I'm assuming your scripting language of choice is PHP:

    Code php:
    $result = mysql_query("INSERT IGNORE INTO table_name (col) VALUES ('value')");
    if(mysql_affected_rows($result) === 1) {
        $id = mysql_insert_id();
    } else {
        $get_id = mysql_query("SELECT id FROM table_name WHERE col = 'value'");
        $row = mysql_fetch_array($get_id);
        $id = $row[0]['id'];
    }

    So I am worried about my autoincrement value getting really high really quickly.
    How high is unacceptably high in your book? Because if the datatype of the id column is INT, this will allow for well over two billion rows. And if your table gets that large, I think you'll have other things to worry about besides an integer overflow

    Just some food for thought really. Perhaps there is a more elegant solution though.

  3. #3
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the advice on not worrying about the id values getting too high.

    However, I don't see how your advice on how to go about inserting the record is better or less complicated than what I'm doing at the moment?

    You suggested:
    Code php:
    $result = mysql_query("INSERT IGNORE INTO table_name (col) VALUES ('value')");
    if(mysql_affected_rows($result) === 1) {
        $id = mysql_insert_id();
    } else {
        $get_id = mysql_query("SELECT id FROM table_name WHERE col = 'value'");
        $row = mysql_fetch_array($get_id);
        $id = $row[0]['id'];
    }
    How is that better than
    Code php:
    mysql_query("INSERT INTO table SET col = 'value' ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)");
    $id = mysql_insert_id();

    Thanks

    Dave

  4. #4
    SitePoint Wizard
    Join Date
    Mar 2002
    Location
    Bristol, UK
    Posts
    2,240
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Because I *think* in that case, mysql_insert_id wouldn't return the right result because in some cases technically you haven't inserted any rows, you'll have updated one. I may be wrong here though.

  5. #5
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply, that statement always returns the correct id because when it does an update it sets LAST_INSERT_ID() to the id of the row it is updating (i.e. the existing row).

    Dave

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    whether you're wrong, SJH, can be verified quite quickly by testing

    i ain't gonna bother, but the OP should

    in any case, the two step approach you recommended (post #3) is the way i would do it

    actually, i might instead reconsider the design here, and ask myself if i need an auto_increment in the first place
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    whether you're wrong, SJH, can be verified quite quickly by testing

    i ain't gonna bother, but the OP should
    I have been using this method in my sites for quite a while, it definitely works. I got it from the MySQL Manual.

    Quote Originally Posted by r937 View Post
    in any case, the two step approach you recommended (post #3) is the way i would do it
    Any reason why you use the two step approach? It involves extra code, and up to two database calls as opposed to less code and always one database call. I haven't done any benchmarking, but assume the two step method would be slower?

    Quote Originally Posted by r937 View Post
    actually, i might instead reconsider the design here, and ask myself if i need an auto_increment in the first place
    Do you mean have a manual auto increment on the id column, or instead use the column I'm inserting values to as the primary key, and loose the id column altogether?

    Thanks

    Dave

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by djeyewater View Post
    I got it from the MySQL Manual.
    oh, then it ~must~ be okay

    just make sure you use it exactly as they suggested:
    If a table contains an AUTO_INCREMENT column and INSERT ... ON DUPLICATE KEY UPDATE inserts a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value. If the statement updates a row instead, LAST_INSERT_ID() is not meaningful prior to MySQL 5.1.12. However, you can work around this by using LAST_INSERT_ID(expr). Suppose that id is the AUTO_INCREMENT column. To make LAST_INSERT_ID() meaningful for updates, insert rows as follows:

    INSERT INTO table (a,b,c) VALUES (1,2,3)
    ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), c=3;

    Quote Originally Posted by djeyewater View Post
    Do you mean have a manual auto increment on the id column, or instead use the column I'm inserting values to as the primary key, and loose the id column altogether?
    yes, i might consider not having an auto_increment key at all
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    yes, i might consider not having an auto_increment key at all
    Thanks for the advice, do you know if this would effect memory usage much or query speed? (I'm mostly concerned with memory usage as I only have limited memory available on my hosting account)

    I guess the size of other tables with columns referencing the table in question would increase, since they'd need to use the actual value that they were referencing as the key instead of just an integer. And I guess the index size would increase for tables that have a multi column index that includes a column that references the table in question. But I don't know if this would effect memory usage or performance?

    Thanks

    Dave

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by djeyewater View Post
    But I don't know if this would effect memory usage or performance?
    it might

    i can't comment further because all you've mentioned is a table called "table" and a column called "col"

    if there are lotsa tables referencig this one, then yeah, i would probably use a surrogate key
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    it might

    i can't comment further because all you've mentioned is a table called "table" and a column called "col"

    if there are lotsa tables referencig this one, then yeah, i would probably use a surrogate key
    Heh, yeah, I realise I didn't give you much to go on.

    To give you a better idea of the database schema, I have one main table, which contains columns that reference values in various other tables. The majority of these 'other tables' just contain 2 columns, id (int, primary key), and value (varchar, unique).

    Then I also have more tables with just 2 columns, id (int, primary key), and value (varchar, unique). Then I have lookup tables that link these tables to the main table. So, for example, my main table is called 'imageData', then I have a lookup table called 'img_keywords' like
    Code MySQL:
    CREATE TABLE `img_keywords` (
     `img_id` int(10) unsigned NOT NULL,
     `keywords_id` int(10) unsigned NOT NULL,
     PRIMARY KEY (`img_id`,`keywords_id`),
     KEY `fk_img_keywords_imageData` (`img_id`),
     KEY `fk_img_keywords_keywords` (`keywords_id`),
     CONSTRAINT `fk_img_keywords_imageData` FOREIGN KEY (`img_id`) REFERENCES `imageData` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
     CONSTRAINT `fk_img_keywords_keywords` FOREIGN KEY (`keywords_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    and the keywords table like:
    Code SQL:
    CREATE TABLE `keywords` (
     `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
     `Subject` VARCHAR(255) COLLATE utf8_unicode_ci NOT NULL,
     PRIMARY KEY (`id`),
     UNIQUE KEY `UNIQUE` (`Subject`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

    So I don't have lots of tables referencing the sort of table I was asking about, but I do have lots of the sort of table that I was asking about.

    I've attached an image of the schema, which probably shows the setup better than I can explain it.

    Dave
    Attached Images Attached Images

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    thanks for the details, the schema looks really good

    i would definitely use ids here, because of the many-to-many relationships

    i might not use an id for keywords, but that's a special case

    the other lookup tables would be necessary for relational integrity (i.e. to prevent the use of a lens type value, for example, that wasn't a legitimate lens type), so some of these might not require a numeric id

    but at this point i wouldn't go back and change anything

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Zealot
    Join Date
    Dec 2008
    Posts
    112
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Okay, thanks for taking the time to look at the schema, and the advice.

    Cheers

    Dave


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •