How to check if a username is already taken

rpkamp · September 25, 2018, 7:02am

This thread was split from another thread. This post is reply to a post that is still in the other thread

What would you use instead of count? In my mind count is perfect as you would get an absolute 0 or 1, which is very explicit. Also, if the username column is indexed the count would be blazingly fast as it can just use the index instead of the actual table.

spaceshiptrooper · September 25, 2018, 11:28am

Well, I would just use the column name. Doesn’t matter if it’s id or username. It kind of makes no sense to use COUNT if the result returns a 1 or a 0 anyways. Setting a constraint already limits amount of same usernames to 1. If the result returns 1, there really is no need to use COUNT. I typically only use COUNT if the table can contain 2 or more of the same value. Example counting the amount of posts someone has in a blog. Since there is no constraint to this specific problem, we can use COUNT appropriately.

@Rybat
What I mean is to reduce redundancy.

benanamen · September 25, 2018, 4:07pm

That could actually be used as a trick question for a job interview. Since the whole approach of checking for an existing username or email is wrong, you wouldn’t use anything instead of count, let alone use count. As previously explained, you build in a Race Condition by doing so.

rpkamp · September 25, 2018, 8:31pm

I still do that though, so you’re able to show a nice error message to the user in the cases where you do catch it. Sure, you could also catch a database exception and show the error based on that, but that’s a layer too deep for my taste - it should be handled at the form validation stage so that if there any other errors those could be shown as well.

rpkamp · September 25, 2018, 8:35pm

So you would do something like this?

$query = $pdo->query('SELECT id FROM users WHERE username = ?');
$query->execute([$username]);

$row = $query->fetch();

if (false !== $row) {
    // user exists
}

benanamen · September 25, 2018, 9:05pm

Nothing stops you from showing a nice error message to the user for ALL the errors. What you should be doing is putting all the errors into an array. Catching the DB duplicate error is not any more deep than handling any other form error and is the correct way to do it. Do you really want to build in an application bug (Race Condition)?

spaceshiptrooper · September 25, 2018, 9:09pm

Not exactly. I’d use rowCount in the if statement. rowCount should be able to tell if you have anything returned or not. If there are no results, it’ll automatically trigger the else statement. If you want a nice pretty error message, you can still do it this way and still write an error message. I’d still use a constraint though. I believe PDO has error codes that you can check against. I know mysqli definitely does.

benanamen · September 25, 2018, 10:40pm

Yes, there is a specific error code, it is 23000. Really though, any approach other than setting a unique constraint and attempting the insert is incorrect and will build in a hard coded Race Condition into the application. This is not my opinion, it’s just the way it is.

rpkamp · September 26, 2018, 6:10am

I actually do both. I’m using Symfony and there forms are validated using the Symfony Forms component, and there is a check in there to see if there already is another user with that username. Then when it passes the form validation it could indeed cause a race condition, but that is why there is also a unique index on the username in the database, so even if that race condition did occur, it would be stopped anyway.

Also, showing this error alongside other errors by trying to trigger an exception on insert doesn’t make sense, because why would you try to insert data that you know contains errors?

benanamen · September 26, 2018, 6:58am

Just because so and so does something does not make it correct. If you are going to code because so and so does it that way instead coding by knowledge and experience you are just going to have problems.

So you now agree there is a point that there could be a race condition but to you it’s OK because you have a constraint in place. OK, that’s great for stopping a duplicate entry. What about running two query’s when only one is needed? That doesn’t make sense does it? If you have 100 simultaneous requests for the same user name your approach will lie to 99 of the requests and say it’s ok to run yet another query that will fail. By the time your redundant query runs only 1 out of the 100 will actually get the insert. So at this point you have now run 200 query’s instead of 100 of which 99 have failed. Double the amount necessary and a 50% failure rate. Does that really sound right to you? I will redirect your last statement back at you with a slight change, why would you run querys that you know will fail?

Your last comment shows you don’t really have a grasp of the process. The duplicate error is not going to be run alongside any other errors if you code it properly. If there are other errors, the insert query is not even going to run, therefore the duplicate error will always be the only error if there is a duplicate. You also dont “try to trigger an exception”. The duplicate constraint violation is already going to cause a fatal exception error. This is about the only place you should be using try/catch by the way.

Pseudo code:

if  (server request method == post){
//error checks
if ($error){
// Handle error display
}
else{
try{
//attempt insert
}
catch{
//duplicate error
}
  }
}

rpkamp · September 26, 2018, 7:40am

It is a well established school of thought to completely separate display logic / form validation from application handling.

I’m using the command pattern to execute queries separate from the controller so everything is nicely decoupled and testable separately from the controller. Also, using components like Symfony Forms means I can delegate a lot of functionality to code that I know works and is heavily tested. I’ll take that over custom form handling any day.

For a more in-depth idea of what I’m using please refer to http://verraes.net/2015/02/form-command-model-validation/

Also, FYI I don’t appreciate veiled insults in your post.

benanamen · September 26, 2018, 7:57am

I have no idea where you’re going now. Your response has nothing to do with what we’re talking about. I also didn’t make any insults to you. I’m responding to what you posted.

rpkamp · September 26, 2018, 8:47am

The whole point is that form validation is a completely different concern than validating that the input the user gave can actually be executed (which is what the article I linked to is also about). Form validation is for the benefit of the user, while validing the input for the application itself is a technical concern.

Suppose I have a form and I’ve filled in a birthday in an incorrect format, an invalid phone number, and a username that already exist, would you rather:

a) fix the birthday and phone number and resubmit only to find our the username is already taken (with any luck by yourself - so you could have just went to the login page if you’d known that)

or

b) see several errors, among which that the username is already taken, fix all of them, submit the entire form and be done?

I personally hate (a) because I feel that once I’ve fixed all errors the form should not come back to me saying something else is wrong too, because then I can no longer trust that when I fix that it won’t come back whining about something else, which feels very frustrating.

So yes, I’d rather do an extra database query if that means I can deliver a better (in my opinion) user experience.

benanamen · September 26, 2018, 4:48pm

Lets see if I have this straight, correct me if I am wrong. In your opinion…

It is acceptable to intentionally hard code an application bug (Race Condition)
Execute twice as many query’s as are necessary to do the job
It is acceptable to build in a 50% to 99% query failure rate in the case of simultaneous requests.
It is acceptable for your code to “lie” to 50% to 99% of the simultaneous requests and say they have permission to insert when they really don’t unless they are the race winner.

And all this to “deliver a better user experience”.

You are not seeing that in the case of simultaneous requests for a username that is not in the DB, it does not make for a “better user experience” at all and has the exact same result as what I am saying without all the overhead of 1-4 I listed above.

You said you do both methods, name check AND a duplicate constraint. You now get 100 requests to insert “bob” which is currently not in the DB. The users make a form validation error and will only get the form errors (because there is no “bob” in the DB yet, thus no duplicate error from your redundant query.)

They all fix the form errors and now comes 100 simultaneous requests to insert “bob”. Your code basically lies to 99 of those requests because only the first one to execute your second query will get the insert (Race Winner). Now 99 requests get a fatal error from the constraint which means there is STILL two different error displays. The form errors and then the DB error at separate times. EXACTLY the same user experience as one would have if it was coded properly except without the overhead of 50% to 99% more query’s and 50% to 99% more query failures.

The only time your “better user experience” could hold water is if user ‘bob’ was ALREADY in the DB.

rpkamp · September 26, 2018, 5:47pm

Yes, that is exactly what I’m going for.

I’m also not under the impression that 100 users will try to register the same username at exactly the same time. Two already seems unlikely, and the chance of 100 is astronomically small. In my book, when that happens that would not be a race condition but an attempt at DDOS.

Lemon_Juice · October 1, 2018, 3:57pm

@benanamen, while your reasoning is perfectly correct and doing only an insert and catching a potential unique constraint exception is the most efficient and error-free method I think you are too strict in your advice. Depending on the application design it may be more convenient to use a separate SELECT to check if the username exists. I also use a separate class to validate all input and it’s all in a different class that does the insert - it’s a nice separation of concerns. Of course, if I want to I may get rid of the SELECT in my validation class and catch the potential exception in my insert class while still returning the same useful error message to the user - no problem, but that’s a bit of an extra effort. I don’t know how Symfony form validation that @rpkamp uses works - but I agree that a well designed system should allow for doing a validation with a database constraint violation if there is really such a requirement.

Another thing is, some databases (MySQL?) do not return any information which constraint has been violated so if you have two fields that must be unique you have no way of knowing which unless you do an extra SELECT. Also, checking for unique constraint violation is very database type specific - checking for specific error codes, etc.

Now just think about how likely it is that in the case of a SELECT followed immediately by an INSERT two users will try to sign up with the same non-existent username at the same moment within the same millisecond window? Your example of 100 simultaneous requests for the same name is as likely as a meteorite falling on your right foot while sunbathing on the first day of your vacation. Even if the site is fairly popular - submitting a new account form is one of the rarest page requests as compared to all other activities that are going on.

With this I just wanted to show what I’m getting at - if an occurrence of an error is so extremely unlikely then there’s no need to be bothered with it and spend any extra time trying to program around it. Just do what is easiest, fastest, most convenient, elegant, etc. and be done with it. Even if there’s one race condition within the period of 100 years then what? One error page in a century for someone who would have to submit the form again - I don’t really care. It’s like you would argue against using UUIDs because a collision is possible.

Of course, in security critical cases like banking systems, etc. we have to be strict and use methods that are 100% free from race conditions - but this use case is far from that.

And saying that the extra SELECT will impact performance is really like fighting for micro-optimizations like you would argue for using a function instead of a class because it performs better. When your site gets to a point that you’ll have a 100 new account submissions per second then sure. But 99.999% of web sites will never ever come close to such traffic. I doubt that even Facebook is anywhere close to that.

benanamen · October 1, 2018, 4:29pm

You have summed it up right there. Nothing more I can say about it. KISS

As far as everything else you said I will have to soak it in a minute if I am to make any sensible reply’s if any at all. As to how you are doing things, I cannot comment sensibly without seeing your code. If you have it on a repo, I would like to take a look at it. If not, you might want to put it up so that not only I, but other programmers can look it over. Never hurts to get a free code review.

Lemon_Juice · October 1, 2018, 7:05pm

Your summing up as KISS is valid only if you talk about the case in isolation. In a specific context this may be the simplest way or may not be. If your application design allows for database validation in a simple way then all is good. But there are cases when this requires quite some effort and it may turn out it’s not worth it.

This sounds sensible. Since I don’t have any of my code in a public repo I’ve prepared a simplest example of my usual application structure to validate and insert an item - in this case a user. So here you go, a controller and a model class that inserts the user, and a validator class that is used by the user model:

class UserController {
    private $user;

    public function __construct(User $user) {
        $this->user = $user;
    }
    
    /* Form POST submit request */
    public function addUserSubmit(Request $request) {
        $formData = $request->request->get('user');
        
        try {
            $this->user->createFromForm($formData);
            
        } catch (ValidationException $ex) {
            $templateData = array(
                'errors' => $ex->getErrors(),
            );
            
            // display the form again with validation errors...
            // [...]
        }
        
        // user added successfully...
        // [...]
    }
}


class User {
    protected $db;

    public function __construct(Connection $db, UserValidator $userValidator) {
        $this->db = $db;
        $this->userValidator = $userValidator;
    }
    
    public function createFromForm(array $data) {
        // validate all data
        $errors = $this->userValidator->validate($data);
        
        if ($errors) {
            // validation not passed
            $ex = new ValidationException;
            $ex->setErrors($errors);
            throw $ex;
        }
        
        // insert data into database table "users"
        $this->db->insert('users', $data);
    }
}


class UserValidator {
    private $db;

    public function __construct(Connection $db) {
        $this->db = $db;
    }
    
    public function validate(array $user):array {
        $errors = [];
        
        if (strlen($user['username']) == 0) {
            $errors['username'] = "Please enter a username.";
        }
        elseif (!preg_match('/^[\w ]+$/', $user['username'])) {
            $errors['username'] = 'Username can only consist of letters, numbers, space and underscore.';
        }
        else {
            // check for uniqueness
            $exists = $this->db->fetchArray("SELECT 7 FROM users
                WHERE username=?",
                [$user['username']]);
            
            if ($exists) {
                $errors['username'] = "This username is already taken.";
            }
        }
        
        if (strlen($user['name']) == 0) {
            $errors['name'] = "Please enter your name.";
        }
        
        return $errors;
    }
}

As you can see all validation checks are done in the validator class that is separate from the user class that inserts data. That’s why in this scenario doing the unique check in the validator with a SELECT is more convenient because the validation logic is all in one place.

And here is the version of the same but using the database INSERT for the uniqueness check (the controller is unchanged):

class User {
    protected $db;

    public function __construct(Connection $db, UserValidator $userValidator) {
        $this->db = $db;
        $this->userValidator = $userValidator;
    }
    
    public function createFromForm(array $data) {
        // validate data except for uniqueness
        $errors = $this->userValidator->validate($data);
        
        if ($errors) {
            // validation not passed
            $ex = new ValidationException;
            $ex->setErrors($errors);
            throw $ex;
        }
        
        // insert data into database table "users"
        try {
            $this->db->insert('users', $data);
        }
        catch (DbUniqueConstraintViolationException $ex) {
            // here I assume this is an attempt to insert an existing username
            $ex = new ValidationException;
            $ex->setErrors([
                'username' => "This username is already taken."
            ]);
            throw $ex;
        }
    }
}


class UserValidator {
    public function validate(array $user):array {
        $errors = [];
        
        if (strlen($user['username']) == 0) {
            $errors['username'] = "Please enter a username.";
        }
        elseif (!preg_match('/^[\w ]+$/', $user['username'])) {
            $errors['username'] = 'Username can only consist of letters, numbers, space and underscore.';
        }
        
        if (strlen($user['name']) == 0) {
            $errors['name'] = "Please enter your name.";
        }
        
        return $errors;
    }
}

Here the validation code is split and exists in both classes. It can be done but to me, it’s better to have it in a non-fragmented state. I may decide it’s not worth the effort to keep the code fragmented and just allow the validation with an additional SELECT.

And additionally, in my first validator class I can add as many other unique checks as I want (email, licence number, etc.) really easy, whereas doing it with database INSERT alone may be hard, if not impossible, for some databases - in the sense of getting information which constraint has been violated.

What other options do I have? I may mix data validation and database insertion in one class but then my class grows large and probably assume too many responsibilities.

If you have some insights then go on. I’m always open to improving my code if I see a real benefit. I don’t know how this case is solved in the Symfony style - whether doing a unique check with an INSERT is easy or not.

rpkamp · October 1, 2018, 8:23pm

The high level overview is the same as what you’ve described in your setup

benanamen · October 1, 2018, 8:47pm

For now, lets keep the focus on the code you refer to as “INSERT for the uniqueness check” since as I have made abundantly clear that is the method that should be implemented. The problem is, according to your code, that’s not what your doing.

Setting aside the validation implementation for now, we arrive at the insert. Any user registration is at minimum going to need a unique email address & constraint. Lets go with that we have both a username and email requirement.

As your comment clearly and correctly states, you are ASSUMING the error for a failed insert is a duplicate username. As programmers we should NEVER assume anything. Your posts contain many assumptions.

here I assume this is an attempt to insert an existing username

The fact is, you don’t know because you don’t check. It could be a duplicate username, or it could be a duplicate email, OR the query could have failed for any other number of reasons.

You have made clear from your comments that you do not know if Mysql will return information about which constraint has been violated. I can tell you for a fact that it does. As to other databases, we are talking about Mysql right now so lets stay on point.

As to which constraint in particular is violated if that is indeed the cause of the error (remember, your posted code doesn’t know) the user should not be informed of exactly which one it is. That is a big security risk. The error message should be invalid username or email.

As far as you the admin, do you really need to know a constraint was violated and which one it was? Sure you can log it if you want, no problem. Mysql is more than happy to give you specific information about it. I happen to not only log it but also send myself an email about it.

For a more in depth conversation on your code I would highly suggest you upload all of it to a repo so it can be viewed as a whole and start a new thread. We will easily get off topic and are already on the verge of hijacking the OP’s thread.