Good Users and Bad Passwords

It’s getting more common for sign-up forms to validate the format of passwords, and then give visual feedback on the password’s content or strength.

You’ve probably seen Google’s signup form, or something very like it, which is actually quite a good example (and I’ll talk about why in a moment):

A popup balloon which contains a meter indicating that the input password has good password strength.

You might also have seen examples like this being suggested and used on the web:

A bullet list of criteria indicating that the input password is at least 8 characters and contains lowercase letters, but does not contain uppercase letters, numbers or punctuation.

The idea of having that checklist is to reduce friction for users, by providing specific feedback on the format that’s required, rather than simply rejecting the password on vague or unspecified terms.

But it’s still a bad idea, however well it’s implemented, because it propagates a misleading view of what constitutes a good password.

To understand this issue, we need to begin by understanding how a password’s strength is determined.

What makes a strong password?

The strength of a password is typically described using the term password entropy, which is a measure of its randomness. It’s not so much a measure of that specific password, as of all the possible passwords which contain the same range of characters (i.e. all the possibilities a computer would have to try in order to crack it by brute-force).

Entropy is usually expressed in bits: if we refer to a password as having n bits of entropy, it means that the entropy value is 2 to the power n. A single lower-case English letter has approximately 4.7 bits of entropy, because 24.7 is approximately 26. So if a password only contains lower-case letters, then each will add another 4.7 bits of entropy (i.e. a two-letter password will have 9.4 and so on).

If we replace one or more letters with other characters, then the range (and therefore the entropy) will increase. There are 94 non-diacritic letters, numbers and special characters in US ASCII, so each will have approximately 6.55 bits of entropy (because 26.55 is approximately 94).

Therefore an eight-letter password which might contain any of these characters will have approximately 52.4 bits of entropy, whereas a password of the same length with only lower-case letters will have 37.6 bits of entropy.

However a sixteen-letter password with only lower-case letters will have 75.2 bits of entropy.

To put that into some kind of context: a password with 52.4 bits of entropy might be cracked by a desktop PC in less than half an hour, while a password with 75.2 bits of entropy could take several hundred years. The longer the password, the more time it takes to crack, exponentially.

So in general terms, a long password with nothing but lower-case letters is better than a short password with a mixture of characters.

Putting theory into practice

This throws a different light on what constitutes a good password. It means that this is my password is a much stronger password than pA5%w*rD, and yet it’s so much easier to remember.

Although we must concede that there is a problem with this way of analysing passwords, which is its assumption that every character was randomly chosen. In practice that’s seldom the case, since passwords are usually chosen by people, and people don’t make random choices.

One example of non-random choices is psychological traits — the tendency of people to use obvious words, celebrity names, or common associations. Password-cracking software might take account of such things to optimize its work, and produce results more quickly than is mathematically probable. But it’s not really possible to quantify this when calculating password entropy, because it requires knowledge that can’t be easily abstracted.

Though the Gmail validation tool clearly does takes account of some of these things, since it indicates that this is my password is weaker than this is my whatever, simply because it contains the word password.

But it also indicates that pA5%w*rD is stronger, even though it’s actually very much weaker (as we’ve seen).

And this is the problem with all the password-validation tools I’ve seen — character substitution is given far more emphasis than it deserves, while creating a longer password is given little or no emphasis at all. They’re teaching users to create passwords which are hard for humans to remember but easy for computers to guess.

Putting practice into best-practice

I said at the start that the Gmail tool was quite a good example, and that’s because it assesses the overall password rather than just its individual characters, so it will at least indicate that a longer password is stronger than a short one. I also said that the second example was a bad idea, and that’s because it only highlights character replacement, which is nowhere near as important, and potentially counter-productive.

If I were to sum this up into a general suggested best-practice, it would be this:

  1. don’t validate the format of a password, only validate its length
  2. or if you are going to validate the format, don’t make it required

I can remember once or twice being forced by a site to choose a different password, simply because it didn’t have a mixed-character format. As a user, I found that incredibly frustrating; but it’s also doubly ironic, since it could end up making people use a shorter, and therefore weaker password, or to use the same password for many different sites.

I would advocate two separate fields — one for the password and one to confirm — along with some notes underneath that explain how to write a strong yet memorable password. Both fields are required and must have a minimum length (and obviously must be the same), but the password’s character format isn’t validated or required.

Because ultimately, it’s up to users what kind of password they want to use. As service providers, it’s our responsibility to maintain the security of our users’ accounts. If we’re using techniques like salting and key stretching to store passwords more securely, then it shouldn’t really matter what they choose.

Personally, I like to use old phone numbers and places I’ve lived. For example, if I lived in New York and I can still remember my phone number there, I might use “New York 219 555 4209″ as my password. That’s immensely strong, but also easy for me to remember.

A more general approach is to take several words that are not commonly associated, and then visualise an association between them (like that xkcd cartoon). Simply forming a visual association makes the password more memorable, especially if it has a personal meaning for you; and if it’s all in lower-case letters then you don’t need to remember any convoluted replacements.

It’s easy to imagine some kind of TV drama situation, where Sherlock Holmes is sitting at our computer, trying to guess our password using psychological insights. Do we like Star Trek, drink Yorkshire Tea, or listen to Iron Maiden? In that case “ir0n//ma1d3n” would be harder to guess than “iron maiden” (though not by much, given Holmes’ inevitable knowledge of common substitutions!).

But that’s not going to happen anyway, and users may need to be taught to understand this. Passwords are almost never cracked by people with personal knowledge, they’re cracked by computers with brute-force — and for them, size is everything!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • https://twitter.com/photodow James Dow

    Some sites like Bank of America will not allow a certain amount of characters in your password. It’s a little disheartening to see that type of validation.

    • James Edwards

      You mean they place an upper limit on the length of your password?

      • https://twitter.com/photodow James Dow

        Exactly. I’ve had to shorten my password for a couple websites like that.

        • James Edwards

          Wow, that’s gotta be a new low in bad security and bad usability all at the same time! Though I suppose it’s understandable in one sense — you want the data to be a specific length so you know what SQL column to use. Fair enough. But then you use a column that allows long passwords — a 255 character limit would not be unreasonable, but an 8 character limit certainly is.

          • Anthony

            The length of the password doesn’t matter if you’re hashing it – the hash will always be the same length. With banks it’s often because they need to ask you to confirm the first, fifth and seventh characters (for example), which can’t be done with a single hash. More likely they store each character encrypted, I guess?

    • OphelieLechat

      My ex-bank does that too. They also do everything possible to disable password managers, suggesting that customers instead find something memorable and use that. It’s scary.

  • Steve Robillard

    Doesn’t using old addresses and phone numbers fall into the category of easy for a computer to guess, especially since your username is so often your email address and hence gives an attacker a way to correlate the data (assuming your phone number was listed).

  • James Edwards

    I’m not sure what you mean — the fact that I’m suggesting using phone numbers, means that if more people do it, then more hackers will account for that possibility in brute force attacks?

    But how will that help them? Dictionary attacking every city and phone number in the world would yield more solutions than there are people. Or is that not what you mean?

    • https://mtdb.se/ Christian Carlsson

      Exactly, if hackers wasn’t aware of this trick, they probably wouldn’t include it in their brute force “design”.

      • James Edwards

        Okay. So can you quantify the difference that would make? For example — if you’re running through hashes looking for matches, how much help is it to you to know that one of the things a hash might be is a phone number, given that it could be any phone number in the world, and with any kind of formatting?

        • https://mtdb.se/ Christian Carlsson

          I can’t answer that, you have to watch the talk from Anthony to get the idea.

          • James Edwards

            I’ll do that when I get time.

            I mean I can see the general point, and maybe the only real answer to that is for each of us to come up with our own “trick”, and just don’t tell anyone else. I mean I haven’t been exactly forthcoming — I never actually lived in New York, and although I do use phone numbers as passwords, I don’t format it that way.

            It seems to me that there are so many variations, it wouldn’t be possible to construct a dictionary attack that’s any faster than brute-force. But I may be wrong about that.

          • Steve Robillard

            It depends on the attack, if you are brute forcing thousands of passwords from around the world it may not speed up the process much, but if you are targeting a specific individual or a small number of individuals (like a company sized target). It could make a significant difference.

          • James Edwards

            Yeah that makes sense. So it would (for example) be very bad practise for a company to recommend all their employees use a specific method of coming up with passwords (such as using their phone number).

          • Steve Robillard

            Or for the employee to do it on their own. The more I can find out about the person or persons being attacked the better my chances of cracking their password – this would include the password requirements (length, composition, method of deriving the password etc.) If i can determine your email address and name I can use social media (facebook, linkedin etc). to help me determine likely passwords (kids names, graduation dates. etc.)and even worse answers to your password reset questions. I just had this come up today most of the questions could have been answered by someone who could see my facebook profile (what was my school mascot, childhood street name).As you point out in the article humans don’t do random well. Having said all that i am sure I am moving up on the NSA’s watch list.

  • jokeyrhyme

    One thing a site could do is try to log into the users email account using the provided password and email address. If it succeeds, then the user should be told that they have used the password somewhere else.

    Not sure if this is a good idea in practice though, which is probably why no service seems to do this.

    • Jakub Paś

      Good idea but it might be against the law

      • jokeyrhyme

        Ah, yeah. Probably counts as fraud or impersonation.

  • Tatsh

    I would strongly advise knowing any of your passwords except critical ones. I use LastPass to manage storage of most of my passwords. It has a mobile app (with a built-in browser) and a way to input into iOS Safari.

    With LastPass generator I set all my passwords to the maximum length allowed. I do not know really much of any of my passwords. Which also decreases stress levels.

    Regarding using your old phone number, if someone really wanted to get into your account, I think that might be something they will definitely look for, and it may be in a place you do not remember. And this is definitely true if this common wisdom going around (which I definitely disagree with).

  • M S

    Paypal must be one of the worst offenders.
    I was trying to create an account the other day, and discovered that they have blocked pasting in pw-fields!

    I had a good strong passwords already generated by the program i use to handle my passwords,
    but now i had to use a short easy to type one instead.

    Fu.king morons!

    • Steve Robillard

      try pasting with the keyboard shortcut ctrl-c this usually works for sites that block the right click menu option. Also Paypal supports two factor authentication to improve their security.

      • M S

        Nope, nothing worked.

        • Steve Robillard

          Sorry I meant Ctrl-v.

  • Ian Simmons

    First I don’t claim to have a good understanding of hack methods. I understand what you say about the longer password having a higher entropy. But isn’t the reason for mixed characters to decrease the chance of your password being guessed from a dictionary? Wouldn’t “this is my password” be closer to the top of a dictionary file which would defeat the benefit of the length? In other words wouldn’t “pass my is thisword” be more secure because it is not only long but also random nonsense? I personally make my passwords long and also complex with a mix of capitals, numbers, and symbols. In my mind this is the order of it.

    “@3Cat$” //insecure because it’s short
    “I like cats and cats like me” //insecure because it has no randomness
    “#!$LoveRoF#$%catS13$” //ridiculous but very secure because it’s long and random

    • briand06

      > “I like cats and cats like me” //insecure because it has no randomness
      > “#!$LoveRoF#$%catS13$” //ridiculous but very secure because it’s long and random

      It is not a question of randomness but rather of length. With a dictionary-based approach, the attacker must find all the words in your password in the correct order. The more words you use the better.
      On the other hand, with a brute force approach the level of randomness of the password does not matter, since the attacker must necessarily check all the characters in a pool. What does matter is the width of the pool (search space): using letters, numbers and symbols is much better than using just numbers. It takes the same time to crack “#!$LoveRoF#$%catS13$” or “FLRSaceootv13$$$!##%”.

  • http://hanshelgebuerger.de/ Hans-Helge

    Nice article. It nicely highlights the problem with weak passwords and how “some” companies want to make us believe that the key is a hard to remember password.

    However, you missed one thing, which is also important to mention, why the second password form is weaker and easier for hackers to crack. (I didn’t read all the comments, so excuse me if I repeat someone). The problem with the second form is that you have to choose lower and uppercase letters, numbers, etc. That means, the company knows, you know, and as well the hacker does know that you have at least one (!) of these characters in your password. That said, the entropy sinks because you can exclude a bunch of possible password. The range is now smaller than the same password at google. Google does know force you or tells the hacker that each password has a uppercase letter. You see?

    So I totally agree with you that it is no wise move to force a user into a specific password format. The entropy is only at its possible hight if you don’t know the pattern :)

  • LouisLazaris

    Hey, James Edwards, nice article. A little late to the discussion here, but I thought I’d mention this TED talk that is quite similar in topic:

    http://www.ted.com/talks/lorrie_faith_cranor_what_s_wrong_with_your_pa_w0rd

    I had just stumbled upon it last week, and I included it in my brief intro for last week’s SitePoint newsletter:

    http://go.sitepoint.com/t/ViewEmail/y/09371BE25E7921CD/154ED855A1648AE6A4A88C2FAEAC43DE

    Just thought I’d mention the video in case you hadn’t seen it.

  • Mohd. Mahabubul ALam

    The strength of a password is typically described using the term password entropy, which is a measure of its randomness. It’s not so much a measure of that specific password, as of all the possible passwords
    which contain the same range of characters (i.e. all the possibilities
    a computer would have to try in order to crack it by brute-force).

    Entropy is usually expressed in bits: if we refer to a password as having n bits of entropy, it means that the entropy value is 2 to the power n. A single lower-case English letter has approximately 4.7 bits of entropy, because 24.7 is approximately 26. So if a password only contains lower-case letters, then each will add another 4.7 bits of entropy (i.e. a two-letter password will have 9.4 and so on).

    If we replace one or more letters with other characters, then the
    range (and therefore the entropy) will increase. There are 94
    non-diacritic letters, numbers and special characters in US ASCII, so
    each will have approximately 6.55 bits of entropy (because 26.55 is approximately 94).

    Therefore an eight-letter password which might contain any of these characters will have approximately 52.4 bits of entropy, whereas a password of the same length with only lower-case letters will have 37.6 bits of entropy.

    However a sixteen-letter password with only lower-case letters will have 75.2 bits of entropy.

    To put that into some kind of context: a password with 52.4 bits of entropy might be cracked by a desktop PC in less than half an hour, while a password with 75.2 bits of entropy could take several hundred years. The longer the password, the more time it takes to crack, exponentially.

    So in general terms, a long password with nothing but lower-case letters is better than a short password with a mixture of characters.