Checking if username is already taken

How would doing a SELECT then an INSERT using transactions open it up to “race conditions”?

From the PHP manual (http://www.php.net/manual/en/pdo.transactions.php)

You’re not limited to making updates in a transaction; you can also issue complex queries to extract data, and possibly use that information to build up more updates and queries; while the transaction is active, you are guaranteed that no one else can make changes while you are in the middle of your work. For further reading on transactions, refer to the documentation provided by your database server.

So, you will have different users with the same email?

Can you explain, in case you also need (you will need!) the email to be unique?

There may be another idea :slight_smile:


-- DROP PROCEDURE AddUsername;

DELIMITER |
CREATE PROCEDURE AddUsername(
	IN the_username VARCHAR(150), the_email VARCHAR(250),
	OUT didIt BIGINT(20)
)
BEGIN
DECLARE greenLight INT(1);
SELECT COUNT(`iduser`) INTO greenLight FROM `users` WHERE `username` = the_username OR `email` = the_email;
SET didIt = 0;
IF greenLight = 0 THEN
INSERT INTO `users` ( `username`, `email` ) VALUES ( the_username, the_email );
SET didIt = LAST_INSERT_ID();
END IF;
END |
DELIMITER ;

CALL AddUsername( "george", "m@m.com", @wedidit );
SELECT @wedidit;

Tested on

CREATE TABLE IF NOT EXISTS `users` (
  `iduser` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(150) NOT NULL,
  `email` varchar(250) NOT NULL,
  `name` varchar(150) NOT NULL,
  PRIMARY KEY (`iduser`),
  UNIQUE KEY `uniq_username` (`username`)
--  , UNIQUE KEY `uniq_email` (`email`) -- may not be unique
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

So now set that up so that several copies of that code run simultaneously so that at least two of them run the SELECT before the first one runs the INSERT.

Let’s not try to be so paranoid and watertight where not really required. What are the chances of two different people entering the same username and pressing the submit button within the same fraction of a second? The likelihood is so tiny than worrying about it makes little sense and in most cases this will never happen. And even if it happens (however I think it’s more likely to get hit by a meteor falling from the sky) then no big deal - the user will see an error message and will have to try again.

Doing the INSERT straight away is the best method to let the db handle the uniqueness perfectly 100% of the time but in this particular case it’s not really a requirement. And there’s also a problem mentioned by someone else here - if you add another unique column to the table for some reason then you can’t really tell which constraint has been violated - with SELECT there’s no problem with that. In this particular case I don’t see a problem with either of these methods - in practice they both will work the same (and the performance difference will be negligeable).

I’d say keeping the username as PK is a bad idea and changing usernames is the least problem (which is handled automatically by ON UPDATE CASCADE). In most cases the user’s PK is used across many other tables as a FK so you need to keep all those columns exactly the same type. Imagine one day you want to change the maximum length of the username? Or even character set collation? MySQL will not allow you to simply ALTER TABLE because there will be a mismatch between types with all the corresponding foreign keys. You also can’t ALTER TABLE the foreign keys for the same reason. In such a case you need to DROP all these foreign keys, change all the column types and add the foreign keys back - maintenance nightmare!

Why me? :slight_smile: You’re the one that thinks it’s not a good solution.
We’re talking about “george” and “george” registering in the exact same time, in the time counted by a SELECT on the PrimaryKey, that may take 0.001 seconds.

Exactly!

If it is going to hang the entire database if they do then even the slightest chance should be avoided where possible - which it is simply by getting rid of the unnecessary select and testing the code returned from the insert. This also has the added benefit of making the database access far more efficient and so speeding up the processing.

Getting rid of the select is worth it just from the viewpoint of the speed increase.

Anyone who has done even an introductory course on using databases knows NEVER to use a SELECT before an INSERT because it is never needed, slows the processing down unnecessarily, and can potentially cause problems. Even disregarding the potential problems why slow your code down unnecessarily.

There’s a few things the OP hasn’t mentioned:

@Godz06;

  • What database server software is being used?
  • What storage engine is being used for the user table (not all of them support transactions)?
  • Is there a separate table for signups (confirmed ones being moved to the users table and unconfirmed ones deleted after so many days (or hours))?

Generally, your points are valid but in this case you take wrong assumptions:

  1. Hanging the db - the db is not going to hang if the very unlikely scenario happens, the insert will just fail with an error and that’s it. There’s nothing going on here that would cause hanging of the db.

  2. Speed - even if you had 10000 new users in a day a simple select by a unique key would be minuscule performance-wise compared to all the other things the server would be doing to run the entire site that you wouldn’t even notice the impact. This is worrying about micro-optimizations in places that are farthest from being a bottleneck.

I’m not arguing that what you suggest is wrong. I’m just saying that using SELECT is not wrong either - in this particular case. There may be reasons to prefer the SELECT method because the requirements are not always so simple that you need to only check if a username exists. For example, you may want to check whether a username exists and whether it’s active, or you need to grab some other properties of the existing user - then the unique index on the username is not enough to do it.

While theory tells us to do something a specific way, it is not always that pan out as the best option available.

In regards to the actual select not being cost effective, that is only true when you run a smaller website with only one database server. When the website becomes popular enough to require a database cluster, it is cheaper hitting the read slaves than the masters.

With that in mind, in this specific cause I would argue for that a select before the insert is the best solution (of course with the uniques in place as a final backup solution).