Posts Tagged ‘password’
How passwords should be hashed before storage or usage is a very common question, always triggering passionate debate. There is a simple and comprehensive answer (use bcrypt, but PBKDF2 is not bad either) which is not the end of the question since theoretically better solutions have been proposed and will be worth considering once they have withstood the test of time (i.e. “5 to 10 years in the field, and not broken yet”).
The less commonly asked question is:
why should a password be hashed?
This is what this post is about.
Encryption and Hashing
A first thing to note is that there are many people who talk about encrypted passwords but really mean hashed passwords. An encrypted password is like anything else which has been encrypted: it has been rendered unreadable through a process which used an extra piece of secret data (the key) and which can be reversed with knowledge of the same key (or of a distinct, mathematically related key, in the case of asymmetric encryption). For password hashing, on the other hand, there is no key. The hashing process is like a meat grinder: there is no key, everybody can operate it, but there is no way to get your cow back in full moo-ing state. Whereas encryption would be akin to locking the cow in a stable. Cryptographic hash functions are functions which anybody can compute, efficiently, over arbitrary inputs. They are deterministic (same input yields same output, for everybody).
Once hashed, the password is still quite useful, because even though the hashing process is not reversible, the output still contains the “essence” of the hashed password and two distinct passwords will yield, with very high probability (i.e. always, in practice), two distinct hashed values (that’s because we are talking about cryptographic hash function, not the other kind). And the hash function is deterministic, so you can always rehash a putative password and see if the result is equal to a given hash value. Thus, a hashed password is sufficient to verify whether a given password is correct or not.
This still does not tell us why we would hash a password, only that hashing a password does not forfeit the intended usage of authenticating users.
To Hash or Not To Hash ?
Let’s see the following scenario: you have a Web site, with users who can “sign in” by showing their name and password. Once signed in, users gain “extra powers” such as reading and writing data. The server must then store “something” which can be used to verify user passwords. The most basic “something” consists in the password themselves. Presumably, the passwords would be stored in a SQL database, probably along with whatever data is used by the application.
The bad thing about such “cleartext” storage of passwords is that it induces a vulnerability in the case of an attack model where the attacker could get a read-only access to the server data. If that data includes the user passwords, then the villain could use these passwords to sign in as any user and get the corresponding powers, including any write access that valid users may have. This is an edge case (attacker can read the database but not write to it). However, this is a realistic edge case. Unwanted read access to parts of a Web server database is a common consequence of an SQL injection vulnerability. This really happens.
Also, the passwords themselves can be a prized spoil of war: users are human beings, they tend to reuse passwords across several systems. Or, on the more trivial aspect of things, many users choose as password the name of their girlfriend/boyfriend/dog. So knowing the password for a user on a given site has a tactical value which extends beyond that specific site, and even having an accidental look at the passwords can be embarrassing and awkward for the most honest system administrator.
Storing only hashed passwords solves these problems as best as one can hope for. It is unavoidable that a complete dump of the server data yields enough information to “try” passwords (that’s an “offline dictionary attack”) because the dump allows the attacker to “simulate” the complete server on his own machines, and try passwords at his leisure. We just want that the attacker may not have any faster strategy. A hash function is the right tool for that. In full details, the hashing process should include a per-password random salt (stored along the hashed value) and be appropriately slow (through thousands or millions of nested iterations), but that’s not the subject of this post. Just use bcrypt.
Summary: we hash passwords to prevent an attacker with read-only access from escalating to higher power levels. Password hashing will not make your Web site impervious to attacks; it will still be hacked. Password hashing is damage containment.
A drawback of password hashing is that since you do not store the passwords themselves (but only a piece of data which is sufficient to verify a password without being able to recover it), you cannot send back their passwords to users who have forgotten them. Instead, you must select a new random password for them, or let them choose a new password. Is that an issue ? One could say that since the user forgot his old password, then that password was not easy to remember, so changing it is not a bad thing after all. Users are accustomed to such a procedure. It may surprise them if you are able to send them back their password. Some of them might even frown upon your lack of security savviness if you so demonstrates that you do not hash the stored passwords.
A quick bit of background:
With all the furore about passwords, most companies and individuals know they should have strong passwords in place, and many use system enforced password complexity rules (eg 8 characters, with at least 1 number and 1 special character) but how could a company actually audit password strength.
John the Ripper was a pretty good tool for this – it would brute force or use a dictionary attack on password hashes, and if it broke them quickly they were weak. If they lasted longer they were stronger (broadly speaking)
So far so good, but what if you are a security professional emulating an attacker to assess controls? You could run the brute forcer for a while, but this isn’t what an attacker will do – maths has provided much faster ways to get passwords:
Hash Tables and Rainbow Tables
Hash Tables are exactly as the name sounds – tables of hashes generated from every possible password in the space you want, for example a table of all DES crypt hashes for unsalted alphanumeric passwords 8 characters or less, along with the password. If you manage to get hold of the password hashes from the target you simply match them with the hashes in this table, and if the passwords are in the table you win – the password is there (excluding the relatively small possibility of hash collisions – which for most security purposes is irrelevant as you can still use the wrong password if its hash matches the correct one). The main problem with Hash tables is that they get very big very quickly, which means you need a lot of storage space, and an efficient table lookup over this space.
Which is where Rainbow Tables come in. @Crunge‘s answer provides excellent detail in relatively simple language to describe the combination of hashing function, reduction function and the mechanism by which chains of these can lead to an efficient way to search for passwords that are longer or more complex than those that lend themselves well to a hash table.
In fact @Crunge’s conclusion is:
Hash tables are good for common passwords, Rainbow Tables are good for tough passwords. The best approach would be to recover as many passwords as possible using hash tables and/or conventional cracking with a dictionary of the top N passwords. For those that remain, use Rainbow Tables.
@Mark Davidson points us in the direction of resources. You can either generate the rainbow tables yourself using an application like RainbowCrack or you can download them from sources like The Shmoo Group, Free Rainbow Tables project website, Ophcrackproject and many other places depending on what type of hashes you need tables for.
Now from a defence perspective, what do you need to know?
Longer passwords are still stronger against attack, but be aware that if they are too long then users may not be able to remember them. (Correct Horse Battery Staple!)
A simple and effective defence that most password hashing functions provide now is salting – the addition of a per user “salt” value stored in the database along with the hashed password. It isn’t intended to be secret but is used to slow down the brute force process and to make rainbow tables impractical to use. Another add-on is what is called a “pepper” value. This was just another random string but was the same for all users and stored with the application code as opposed to in the database. the theory here is that in some circumstances the database may be compromised but the application code is not, and in those cases this could improve the security. It does, however, introduce problems if there are multiple applications using the same password database.
According to Rory Alsop’s post the most popular topic at our IT Security site is passwords. So in this post I’ll provide a tour of some relevant questions and answers, and a look at how it sparked further investigation of the sorry state of password protection in some current systems.
There is an amazing amount of confusion, disinformation and bad practice out there with respect to password management. This is all the more frustrating because the basics were worked out back in the dark ages (1978 to be precise) for Unix by Bob Morris and Ken Thompson (Password Security: A Case History, Morris & Thompson, 1978). They not only covered the importance of hashing passwords with salts and a slow algorithm, but they described what is now called a “pepper” (a second salt stored apart from the password database).
So if you peruse the dozens of ‘passwords’ questions you’ll find frequent reference to the basics. We’ve also tried to summarize some of the key points in the ‘passwords’ tag info. By the way, that’s one of the cool things about StackExchange sites in general – how they provide for a little “wiki” for each tag, to allow us to put key information just a mouse hover or one click away from readers.
Of course even among good practitioners, there are debates about the finer points, and that is another thing you’ll find here. See e.g. the spirited conversation about highly iterated password hashes for web apps vs the risk of DDoS attacks at Do any security experts recommend bcrypt for password storage?.
In the end, you’ll often find that the right approach depends on what kind of “security” you’re looking for. As our Frequently Asked Questions points out, this depends on context – what assets you’re trying to protect from what sorts of threats, how your different vulnerabilities compare, and how that all fits into your business plan.
I’ll end with a look at one question which demonstrates the kind of expertise we have already attracted to the site: MySQL OLD_PASSWORD cryptanalysis?. I asked this after stumbling across some other questions about how MySQL used to deal with passwords. To my amazement, within hours, noted cryptographer Thomas Pornin had not only cracked the old algorithm, but he included some code to demonstrate just how totally broken the OLD_PASSWORD scheme was. Subsequently we also found a paper from 2006 with more details. Evidently the folks at Oracle who work on MySQL hadn’t gotten the memo then, and still have a lot of work to do to improve their current scheme, as described at Looking for example of well-known app using unsalted hashes.
So stay ahead of the crowd: when you have a question about security, see if we have a good answer. If not, ask away. And if you see questions that need more work, please contribute to providing good answers!