Risks and Challenges of Password Hashing

In a past article, password hashing was discussed as a way to securely store user credentials in an application. Security is always a very controversial topic, much alike politics and religion, where many points of view exist and a ‘perfect solution’ for someone is not the same to others. In my opinion, breaking an application’s security measures is just a matter of time. With computer power and complexity increasing every day, today’s secure applications will not be so secure tomorrow.

For our readers who are not familiar with what a hash algorithm is, it’s nothing more than a one way function that maps data of variable length to data of fixed length. So if we analyze the above definition we need to understand the following requirements and characteristics of such algorithms:

One way function: the output cannot be reversed using an efficient algorithm.
Maps data of variable length to data of fixed length: meaning that the input message space can be “infinite”, but the output space is not. This has the implication that 2 or more input messages can have the same hash. The smaller the output space, the greater the probability of a ‘collision’ between two input messages.

md5 has confirmed practical collisions and sha1’s probabilities for reaching a collision are growing every day (more info in collision probability can be found by analyzing the classic Birthday Problem), so if we need to apply a hashing algorithm, we should use the ones that have greater output space (and a negligible collision probability), such as sha256, sha512, whirlpool, etc…

They are also called ‘Pseudo-random functions’, meaning that the output of a hashing function should be indistinguishable from a true random number generator (or TRNG).

Why simple hashing is insecure for storing passwords

The fact that the output of a hash function cannot be reverted back to the input using an efficient algorithm does not mean that it cannot be cracked. Databases containing hashes of common words and short strings are usually within our reach with a simple google search. Also, common strings can be easily and quickly brute-forced or cracked with a dictionary attack.

Demonstration

Here is a quick video on how a tool like sqlmap can crack passwords via sql injection by bruteforcing md5 hashes in a database.

Also, we could have just done the simplest of attacks… just grab the hash and google it… Chances are that the hash exists in an online database. Examples of hash databases are:

We also have to consider that since 2 or more identical passwords will indeed have the same hash value, cracking one hash will automatically give you the passwords of every single user that used the same. Just to be clear, say you have thousands of users, it is very likely that a fair amount of them will use (if passwords policies are not enforced) the infamous ‘123456’ password. The md5 hash value of that password is ‘e10adc3949ba59abbe56e057f20f883e’, so when you crack this hash (if you even have to) and search for all the users who have this value in their password field, you will know that every single one of them used the ‘123456’ password.

Why salted hashes are insecure for storing passwords

To mitigate this attack, salts became common but obviously are not enough for today’s computing power, especially if the salt string is short, which makes it brute-forceable.

The basic password/salt function is defined as:

f(password, salt) = hash(password + salt)

In order to mitigate a brute-force attack, a salt should be as long as 64 characters, however, in order to authenticate a user later on, the salt must be stored in plain text inside the database, so:

if (hash([provided password] + [stored salt]) == [stored hash]) then user is authenticated

Since every user will have a completely different salt, this also avoids the problem with simple hashes, where we could easily tell if 2 or more users are using the same password; now the hashes will be different. We can also no longer take the password hash directly and try to google it. Also, with a long salt, a brute-force attack is improbable. But, if an attacker gets access to this salt either by an sql injection attack or direct access to the database, a brute-force or dictionary attack becomes probable, especially if your users use common passwords (again, like ‘123456’):

Generate some string or get entry from dictionary
Concatenate with salt
Apply hash algorithm
If generated hash == hash in database then Bingo
else continue iterating

But even if one password gets cracked, that will not automatically give you the password for every user who might have used it, since no user should have the same stored hash.

The randomness issue

In order to generate a good salt, we should have a good random number generator. If php’s rand() function automatically popped up in your mind, forget it immediately.

There is an excellent article about randomness in random.org. Simply put, a computer can’t think of random data by itself. Computers are said to be deterministic machines, meaning that every single algorithm a computer is able to run, given the exact same input, will always produce the same output.

When a random number is requested to the computer, it typically gets inputs from several sources, like environment variables (date, time, # of bytes read/written, uptime…), then apply some calculations on them to produce random data. This is the reason why random data given by an algorithm is called pseudo random and thus it is important to differentiate from a true random data source. If we are somehow able to recreate the exact conditions present at the moment of the execution of a pseudo-random number generator (or PRNG), we will automatically have the original generated number.

Additionally, if a PRNG is not properly implemented, it is possible to discover patterns in the generated data. If patterns exist, we can predict the outcome… Take for instance the case of PHP’s rand() function on Windows as documented here. While it is not clear which PHP or Windows version is used, you can immediately tell there is something wrong by looking at the bitmap generated by using rand():

Compare to the output image from a TRNG:

Even though the issue has been addressed on PHP >= 5, rand() and even mt_rand() are still considered highly inadequate for security related purposes.

If you need to generate random data, please use openssl_random_pseudo_bytes() available as of PHP 5 >= 5.3.0, it even has the crypto_strong flag that will tell you if the bytes are secure enough.

Here is a quick code sample to generate random strings using openssl_random_pseudo_bytes()

<?php

function getRandomBytes ($byteLength)
{
    /*
     * Checks if openssl_random_pseudo_bytes is available 
     */
    if (function_exists('openssl_random_pseudo_bytes')) {
        $randomBytes = openssl_random_pseudo_bytes($byteLength, $cryptoStrong);
        if ($cryptoStrong)
            return $randomBytes;
    } 

    /*
     * if openssl_random_pseudo_bytes is not available or its result is not
     * strong, fallback to a less secure RNG
     */
    $hash = '';
    $randomBytes = '';

    /*
     * On linux/unix systems, /dev/urandom is an excellent entropy source, use
     * it to seed initial value of $hash
     */
    if (file_exists('/dev/urandom')) {
        $fp = fopen('/dev/urandom', 'rb');
        if ($fp) {
            if (function_exists('stream_set_read_buffer')) {
                stream_set_read_buffer($fp, 0);
            }
            $hash = fread($fp, $byteLength);
            fclose($fp);
        }
    }

    /*
     * Use the less secure mt_rand() function, but never rand()!
     */
    for ($i = 0; $i < $byteLength; $i ++) {
        $hash = hash('sha256', $hash . mt_rand());
        $char = mt_rand(0, 62);
        $randomBytes .= chr(hexdec($hash[$char] . $hash[$char + 1]));
    }
    return $randomBytes;
}

Password stretching can be effective if done right

To further mitigate brute-force attacks, we can implement the password stretching technique. This is just an iterative or recursive algorithm that calculates a hash value over and over in itself, usually tens of thousands of times (or more).

This algorithm should iterate enough in order to perform all calculations in at least 1 second (slower hashing also means the attacker will have to wait).

In order to crack a password secured by stretching, the attacker should:

Know the exact iteration count, any deviation will produce entirely different hashes.
Should wait at least 1 second between each attempt.

This makes an attack improbable… but not impossible. In order to overcome the 1 second delay, an attacker should have higher hardware specs than the computer for which the algorithm was tuned, something that might mean a high cost, so the attack becomes prohibitively expensive.

You can also use standard algorithms, like PBKDF2 which is a Password Based Key Derivation Function

<?php

/*
 * PHP PBKDF2 implementation The number of rounds can be increased to keep ahead
 * of improvements in CPU/GPU performance. You should use a different salt for
 * each password (it's safe to store it alongside your generated password This
 * function is slow; that's intentional! For more information see: -
 * http://en.wikipedia.org/wiki/PBKDF2 - http://www.ietf.org/rfc/rfc2898.txt
 */
function pbkdf2 ($password, $salt, $rounds = 15000, $keyLength = 32, 
        $hashAlgorithm = 'sha256', $start = 0)
{
    // Key blocks to compute
    $keyBlocks = $start + $keyLength;

    // Derived key
    $derivedKey = '';

    // Create key
    for ($block = 1; $block <= $keyBlocks; $block ++) {
        // Initial hash for this block
        $iteratedBlock = $hash = hash_hmac($hashAlgorithm, 
                $salt . pack('N', $block), $password, true);

        // Perform block iterations
        for ($i = 1; $i < $rounds; $i ++) {
            // XOR each iteration
            $iteratedBlock ^= ($hash = hash_hmac($hashAlgorithm, $hash, 
                    $password, true));
        }

        // Append iterated block
        $derivedKey .= $iteratedBlock;
    }

    // Return derived key of correct length
    return base64_encode(substr($derivedKey, $start, $keyLength));
}

There are also time and memory intensive algorithms like bcrypt (via the crypt() function) and scrypt

<?php
//bcrypt is implemented in php's crypt() function
$hash = crypt($pasword, '$2a$' . $cost . '$' . $salt);

Where $cost is the work factor, and $salt is a random string you can generate using the secure_rand() function above.

The workload factor is totally dependent on the target system. You can start with a factor of ‘09’ and increase it until the operation completes in approx. 1 second.

As of PHP 5 >= 5.5.0 you can use the new password_hash() function, which uses bcrypt as the default method for hashing.

There is no scrypt support in PHP yet, but you can check out Domblack’s scrypt implementation.

What about applying encryption techniques?

Hashing and ciphering (or encrypting) are terms which are often confused. As I mentioned before, hashing is a pseudo-random function, while cyphering is generally a ‘pseudo-random permutation’. This means the input message is sliced and changed in such a way that the output is indistinguishable from a TRNG, however the output CAN be transformed back again into the original input. This transformation is done using an encryption key, without which it should be impossible to transform the output into the original message again.

Ciphering has another big difference compared to hashing. While the output message space of hashing is finite, ciphering output message space is infinite, as the relationship between the input and output is 1:1, thus collisions should not exist.

One has to be very careful on how to correctly apply encryption techniques, thinking that just by applying an encryption algorithm to sensitive data is enough to keep it safe is considered wrong, as many problems exist that could lead to data leaks. As a general rule, you should never consider applying your own encryption implementation

Recently, Adobe had a massive data leak of their users database, since they incorrectly applied encryption techniques and I’ll take them as an example of what not to do. I’ll try to be as straight forward as possible, keeping things really simple.

Consider the following schema:

Let’s say the plain text contents of the table are as follows:

Now, someone at Adobe decided to cipher the passwords, but made two big mistakes:

Used the same cipher key to encrypt the passwords
Decided to leave the password hint field in plain text

Let’s say for example, that after applying an enciphering algorithm to the password field, now our data looks like the following:

While the passwords are not able to be simply decrypted, and we cannot know the encryption key used in a simple way, by examining the data we can notice that records 2 and 7 share the same password, as well 3 and 6… This is where the password hint field comes into play.

Record 6 hint is “I’m one!” which does not give us much information, however record 3’s hint does… we can safely guess that the password is “queen”. Records 2 and 7 hints do not give a lot of information alone, but if we look at them together, how many holidays have the same name as a scary movie? Now we have access to everyone’s account that used “halloween” as a password.

To mitigate the risk of data leaks, it is better to switch to hashing techniques, however if you must use encryption techniques to store passwords, we can use tweakable encryption. The term looks fancy, but is very simple.

Let’s say we have thousands of users, and we want to encrypt all passwords. As we saw, we cannot use the same encryption key for every password since the data will be at risks (and other sophisticated attacks become possible). However, we cannot use a unique key for every user, since storing those keys will become a security risk by itself. What we have to do is to generate a single key and use a ‘tweak’ that will be unique to every user, and both the key and the tweak together will be the encryption key for each record. The simplest of tweaks available is the primary key, which by definition is unique to every record in the table (although I do not recommend to use it, this is just for demonstrating the concept):

f(key, primaryKey) = key + primaryKey

Above I’m simply concatenating both the encryption key and the primary key’s value to generate the final encryption key, however you can (and should) apply a hashing algorithm or a key derivation function to them. Also, instead of using the primary key as the tweak, you might want to generate a nonce (similar to a salt) for each record to be used as the tweak.

After applying tweakable encryption to the users table, it now looks like the following:

Of course we still have the password hint problem, but now each record has a unique value, so it is not apparent which users are using the same password.

I want to emphasize that encryption is not the best solution, and should be avoided if possible to store passwords since a lot of weaknesses can be injected… You can and should stick to proven solutions (like bcrypt) to store passwords, but keep in mind that even proven solutions have their weaknesses.

Conclusion

There is no perfect solution and the risk of someone breaking our security measures grows every day. However, cryptographic and data security studies and research continue, with the relative recent definition of sponge functions, our toolkit keeps growing everyday.

Frequently Asked Questions (FAQs) on Password Hashing

What is the difference between password hashing and encryption?

Password hashing and encryption are two different methods used to secure data, but they serve different purposes. Encryption is a two-way function; what is encrypted can be decrypted with the proper key. This means that if someone gains access to the encryption key, they can decrypt the data. On the other hand, hashing is a one-way function that scrambles plain text to produce a unique message digest. Even a small change in the input will produce such a drastic change in output that the new hash value won’t resemble the old one. This makes it impossible to regenerate the original data from the hash value, making hashing more secure for storing passwords.

How does salt in password hashing enhance security?

A salt is a random data that is used as an additional input to a one-way function that hashes data, a password or passphrase. Salts are used to safeguard passwords in storage. The primary function of salts is to defend against dictionary attacks or against pre-computed rainbow table attacks. By adding a unique salt, the hash becomes unique to each user, even if two users have the same password. This means that even if a hacker has access to the hash value, they cannot crack the password without the unique salt.

What is the risk of using weak hashing algorithms?

Weak hashing algorithms, like MD5 and SHA-1, are susceptible to attacks. They have known vulnerabilities and can be cracked relatively easily with modern computing power. For instance, they are vulnerable to collision attacks, where two different inputs produce the same hash output. This compromises the integrity of the data. Therefore, it’s recommended to use stronger hashing algorithms like SHA-256 or bcrypt that provide a higher level of security.

How does password stretching increase security?

Password stretching is a technique used to increase the security of stored passwords. This is done by applying a cryptographic hash function to the user’s password, along with a salt, and then re-hashing the result numerous times. This process increases the time it takes to hash passwords, which can deter attackers who rely on making many rapid attempts to guess the password.

What is a rainbow table attack?

A rainbow table attack is a type of hacking wherein the attacker uses a rainbow table, a precomputed table for reversing cryptographic hash functions, to crack the hashed passwords. The rainbow table contains all the possible plaintext permutations of encrypted passwords. This method is effective against basic hashing, but adding a unique salt to each password hash can prevent rainbow table attacks.

Why is it important to keep the hashing algorithm secret?

While it’s crucial to keep the hashing algorithm secret, it’s not always possible, especially with widely used algorithms. The security of a good hashing system does not rely on the secrecy of the algorithm; instead, it relies on the secrecy and randomness of the salt added to the hash. Even if attackers know the algorithm, without the salt, they cannot reverse-engineer the password from the hash.

What is the role of a pepper in password hashing?

A pepper can be thought of as a second, secret salt, added to increase the security of stored passwords. While a salt is typically stored in the database next to the hashed password, a pepper is kept secret and stored separately, often in the application code. This means that even if an attacker gains access to the database, without the pepper, they cannot crack the passwords.

How often should I update or change the hashing algorithm?

The frequency of updating or changing the hashing algorithm depends on several factors, including the sensitivity of the data you’re protecting and the current state of technology. As a rule of thumb, if a more secure hashing algorithm becomes widely accepted, or if vulnerabilities are found in the current algorithm, it’s time to update. Always stay informed about the latest developments in cryptography to ensure maximum security.

What is the impact of hash functions on the performance of my application?

Hash functions, especially secure ones, can be computationally intensive and may impact the performance of your application. However, this impact is typically negligible compared to the security benefits. The delay caused by a secure hash function can actually serve as a deterrent to attackers, as it slows down attempts to guess the password.

How can I ensure the maximum security of user passwords?

To ensure maximum security of user passwords, follow these best practices: use a strong, secure, and up-to-date hashing algorithm; add a unique salt to each password hash; consider adding a pepper; and implement password stretching. Additionally, enforce strong password policies for your users, such as minimum length, and a mix of characters, numbers, and symbols.