Risks and Challenges of Password Hashing

Miguel Ibarra Romero
Share

In a past article, password hashing was discussed as a way to securely store user credentials in an application. Security is always a very controversial topic, much alike politics and religion, where many points of view exist and a ‘perfect solution’ for someone is not the same to others. In my opinion, breaking an application’s security measures is just a matter of time. With computer power and complexity increasing every day, today’s secure applications will not be so secure tomorrow.

For our readers who are not familiar with what a hash algorithm is, it’s nothing more than a one way function that maps data of variable length to data of fixed length. So if we analyze the above definition we need to understand the following requirements and characteristics of such algorithms:

  • One way function: the output cannot be reversed using an efficient algorithm.
  • Maps data of variable length to data of fixed length: meaning that the input message space can be “infinite”, but the output space is not. This has the implication that 2 or more input messages can have the same hash. The smaller the output space, the greater the probability of a ‘collision’ between two input messages.

md5 has confirmed practical collisions and sha1’s probabilities for reaching a collision are growing every day (more info in collision probability can be found by analyzing the classic Birthday Problem), so if we need to apply a hashing algorithm, we should use the ones that have greater output space (and a negligible collision probability), such as sha256, sha512, whirlpool, etc…

They are also called Pseudo-random functions’, meaning that the output of a hashing function should be indistinguishable from a true random number generator (or TRNG).

Why simple hashing is insecure for storing passwords

The fact that the output of a hash function cannot be reverted back to the input using an efficient algorithm does not mean that it cannot be cracked. Databases containing hashes of common words and short strings are usually within our reach with a simple google search. Also, common strings can be easily and quickly brute-forced or cracked with a dictionary attack.

Demonstration

Here is a quick video on how a tool like sqlmap can crack passwords via sql injection by bruteforcing md5 hashes in a database.

Also, we could have just done the simplest of attacks… just grab the hash and google it… Chances are that the hash exists in an online database. Examples of hash databases are:

We also have to consider that since 2 or more identical passwords will indeed have the same hash value, cracking one hash will automatically give you the passwords of every single user that used the same. Just to be clear, say you have thousands of users, it is very likely that a fair amount of them will use (if passwords policies are not enforced) the infamous ‘123456’ password. The md5 hash value of that password is ‘e10adc3949ba59abbe56e057f20f883e’, so when you crack this hash (if you even have to) and search for all the users who have this value in their password field, you will know that every single one of them used the ‘123456’ password.

Why salted hashes are insecure for storing passwords

To mitigate this attack, salts became common but obviously are not enough for today’s computing power, especially if the salt string is short, which makes it brute-forceable.

The basic password/salt function is defined as:

f(password, salt) = hash(password + salt)

In order to mitigate a brute-force attack, a salt should be as long as 64 characters, however, in order to authenticate a user later on, the salt must be stored in plain text inside the database, so:

if (hash([provided password] + [stored salt]) == [stored hash]) then user is authenticated

Since every user will have a completely different salt, this also avoids the problem with simple hashes, where we could easily tell if 2 or more users are using the same password; now the hashes will be different. We can also no longer take the password hash directly and try to google it. Also, with a long salt, a brute-force attack is improbable. But, if an attacker gets access to this salt either by an sql injection attack or direct access to the database, a brute-force or dictionary attack becomes probable, especially if your users use common passwords (again, like ‘123456’):

Generate some string or get entry from dictionary
Concatenate with salt
Apply hash algorithm
If generated hash == hash in database then Bingo
else continue iterating

But even if one password gets cracked, that will not automatically give you the password for every user who might have used it, since no user should have the same stored hash.

The randomness issue

In order to generate a good salt, we should have a good random number generator. If php’s rand() function automatically popped up in your mind, forget it immediately.

There is an excellent article about randomness in random.org. Simply put, a computer can’t think of random data by itself. Computers are said to be deterministic machines, meaning that every single algorithm a computer is able to run, given the exact same input, will always produce the same output.

When a random number is requested to the computer, it typically gets inputs from several sources, like environment variables (date, time, # of bytes read/written, uptime…), then apply some calculations on them to produce random data. This is the reason why random data given by an algorithm is called pseudo random and thus it is important to differentiate from a true random data source. If we are somehow able to recreate the exact conditions present at the moment of the execution of a pseudo-random number generator (or PRNG), we will automatically have the original generated number.

Additionally, if a PRNG is not properly implemented, it is possible to discover patterns in the generated data. If patterns exist, we can predict the outcome… Take for instance the case of PHP’s rand() function on Windows as documented here. While it is not clear which PHP or Windows version is used, you can immediately tell there is something wrong by looking at the bitmap generated by using rand():

Compare to the output image from a TRNG:

Even though the issue has been addressed on PHP >= 5, rand() and even mt_rand() are still considered highly inadequate for security related purposes.

If you need to generate random data, please use openssl_random_pseudo_bytes() available as of PHP 5 >= 5.3.0, it even has the crypto_strong flag that will tell you if the bytes are secure enough.

Here is a quick code sample to generate random strings using openssl_random_pseudo_bytes()

<?php

function getRandomBytes ($byteLength)
{
    /*
     * Checks if openssl_random_pseudo_bytes is available 
     */
    if (function_exists('openssl_random_pseudo_bytes')) {
        $randomBytes = openssl_random_pseudo_bytes($byteLength, $cryptoStrong);
        if ($cryptoStrong)
            return $randomBytes;
    } 

    /*
     * if openssl_random_pseudo_bytes is not available or its result is not
     * strong, fallback to a less secure RNG
     */
    $hash = '';
    $randomBytes = '';

    /*
     * On linux/unix systems, /dev/urandom is an excellent entropy source, use
     * it to seed initial value of $hash
     */
    if (file_exists('/dev/urandom')) {
        $fp = fopen('/dev/urandom', 'rb');
        if ($fp) {
            if (function_exists('stream_set_read_buffer')) {
                stream_set_read_buffer($fp, 0);
            }
            $hash = fread($fp, $byteLength);
            fclose($fp);
        }
    }

    /*
     * Use the less secure mt_rand() function, but never rand()!
     */
    for ($i = 0; $i < $byteLength; $i ++) {
        $hash = hash('sha256', $hash . mt_rand());
        $char = mt_rand(0, 62);
        $randomBytes .= chr(hexdec($hash[$char] . $hash[$char + 1]));
    }
    return $randomBytes;
}

Password stretching can be effective if done right

To further mitigate brute-force attacks, we can implement the password stretching technique. This is just an iterative or recursive algorithm that calculates a hash value over and over in itself, usually tens of thousands of times (or more).

This algorithm should iterate enough in order to perform all calculations in at least 1 second (slower hashing also means the attacker will have to wait).

In order to crack a password secured by stretching, the attacker should:

  1. Know the exact iteration count, any deviation will produce entirely different hashes.
  2. Should wait at least 1 second between each attempt.

This makes an attack improbable… but not impossible. In order to overcome the 1 second delay, an attacker should have higher hardware specs than the computer for which the algorithm was tuned, something that might mean a high cost, so the attack becomes prohibitively expensive.

You can also use standard algorithms, like PBKDF2 which is a Password Based Key Derivation Function

<?php

/*
 * PHP PBKDF2 implementation The number of rounds can be increased to keep ahead
 * of improvements in CPU/GPU performance. You should use a different salt for
 * each password (it's safe to store it alongside your generated password This
 * function is slow; that's intentional! For more information see: -
 * http://en.wikipedia.org/wiki/PBKDF2 - http://www.ietf.org/rfc/rfc2898.txt
 */
function pbkdf2 ($password, $salt, $rounds = 15000, $keyLength = 32, 
        $hashAlgorithm = 'sha256', $start = 0)
{
    // Key blocks to compute
    $keyBlocks = $start + $keyLength;

    // Derived key
    $derivedKey = '';

    // Create key
    for ($block = 1; $block <= $keyBlocks; $block ++) {
        // Initial hash for this block
        $iteratedBlock = $hash = hash_hmac($hashAlgorithm, 
                $salt . pack('N', $block), $password, true);

        // Perform block iterations
        for ($i = 1; $i < $rounds; $i ++) {
            // XOR each iteration
            $iteratedBlock ^= ($hash = hash_hmac($hashAlgorithm, $hash, 
                    $password, true));
        }

        // Append iterated block
        $derivedKey .= $iteratedBlock;
    }

    // Return derived key of correct length
    return base64_encode(substr($derivedKey, $start, $keyLength));
}

There are also time and memory intensive algorithms like bcrypt (via the crypt() function) and scrypt

<?php
//bcrypt is implemented in php's crypt() function
$hash = crypt($pasword, '$2a$' . $cost . '$' . $salt);

Where $cost is the work factor, and $salt is a random string you can generate using the secure_rand() function above.

The workload factor is totally dependent on the target system. You can start with a factor of ‘09’ and increase it until the operation completes in approx. 1 second.

As of PHP 5 >= 5.5.0 you can use the new password_hash() function, which uses bcrypt as the default method for hashing.

There is no scrypt support in PHP yet, but you can check out Domblack’s scrypt implementation.

What about applying encryption techniques?

Hashing and ciphering (or encrypting) are terms which are often confused. As I mentioned before, hashing is a pseudo-random function, while cyphering is generally a ‘pseudo-random permutation’. This means the input message is sliced and changed in such a way that the output is indistinguishable from a TRNG, however the output CAN be transformed back again into the original input. This transformation is done using an encryption key, without which it should be impossible to transform the output into the original message again.

Ciphering has another big difference compared to hashing. While the output message space of hashing is finite, ciphering output message space is infinite, as the relationship between the input and output is 1:1, thus collisions should not exist.

One has to be very careful on how to correctly apply encryption techniques, thinking that just by applying an encryption algorithm to sensitive data is enough to keep it safe is considered wrong, as many problems exist that could lead to data leaks. As a general rule, you should never consider applying your own encryption implementation

Recently, Adobe had a massive data leak of their users database, since they incorrectly applied encryption techniques and I’ll take them as an example of what not to do. I’ll try to be as straight forward as possible, keeping things really simple.

Consider the following schema:

Let’s say the plain text contents of the table are as follows:

Now, someone at Adobe decided to cipher the passwords, but made two big mistakes:

  1. Used the same cipher key to encrypt the passwords
  2. Decided to leave the password hint field in plain text

Let’s say for example, that after applying an enciphering algorithm to the password field, now our data looks like the following:

While the passwords are not able to be simply decrypted, and we cannot know the encryption key used in a simple way, by examining the data we can notice that records 2 and 7 share the same password, as well 3 and 6… This is where the password hint field comes into play.

Record 6 hint is “I’m one!” which does not give us much information, however record 3’s hint does… we can safely guess that the password is “queen”. Records 2 and 7 hints do not give a lot of information alone, but if we look at them together, how many holidays have the same name as a scary movie? Now we have access to everyone’s account that used “halloween” as a password.

To mitigate the risk of data leaks, it is better to switch to hashing techniques, however if you must use encryption techniques to store passwords, we can use tweakable encryption. The term looks fancy, but is very simple.

Let’s say we have thousands of users, and we want to encrypt all passwords. As we saw, we cannot use the same encryption key for every password since the data will be at risks (and other sophisticated attacks become possible). However, we cannot use a unique key for every user, since storing those keys will become a security risk by itself. What we have to do is to generate a single key and use a ‘tweak’ that will be unique to every user, and both the key and the tweak together will be the encryption key for each record. The simplest of tweaks available is the primary key, which by definition is unique to every record in the table (although I do not recommend to use it, this is just for demonstrating the concept):

f(key, primaryKey) = key + primaryKey

Above I’m simply concatenating both the encryption key and the primary key’s value to generate the final encryption key, however you can (and should) apply a hashing algorithm or a key derivation function to them. Also, instead of using the primary key as the tweak, you might want to generate a nonce (similar to a salt) for each record to be used as the tweak.

After applying tweakable encryption to the users table, it now looks like the following:

Of course we still have the password hint problem, but now each record has a unique value, so it is not apparent which users are using the same password.

I want to emphasize that encryption is not the best solution, and should be avoided if possible to store passwords since a lot of weaknesses can be injected… You can and should stick to proven solutions (like bcrypt) to store passwords, but keep in mind that even proven solutions have their weaknesses.

Conclusion

There is no perfect solution and the risk of someone breaking our security measures grows every day. However, cryptographic and data security studies and research continue, with the relative recent definition of sponge functions, our toolkit keeps growing everyday.