How passwords should be hashed before storage or usage is a very common question, always triggering passionate debate. There is a simple and comprehensive answer (use bcrypt, but PBKDF2 is not bad either) which is not the end of the question since theoretically better solutions have been proposed and will be worth considering once they have withstood the test of time (i.e. “5 to 10 years in the field, and not broken yet”).
The less commonly asked question is:
why should a password be hashed?
This is what this post is about.
Encryption and Hashing
A first thing to note is that there are many people who talk about encrypted passwords but really mean hashed passwords. An encrypted password is like anything else which has been encrypted: it has been rendered unreadable through a process which used an extra piece of secret data (the key) and which can be reversed with knowledge of the same key (or of a distinct, mathematically related key, in the case of asymmetric encryption). For password hashing, on the other hand, there is no key. The hashing process is like a meat grinder: there is no key, everybody can operate it, but there is no way to get your cow back in full moo-ing state. Whereas encryption would be akin to locking the cow in a stable. Cryptographic hash functions are functions which anybody can compute, efficiently, over arbitrary inputs. They are deterministic (same input yields same output, for everybody).
Once hashed, the password is still quite useful, because even though the hashing process is not reversible, the output still contains the “essence” of the hashed password and two distinct passwords will yield, with very high probability (i.e. always, in practice), two distinct hashed values (that’s because we are talking about cryptographic hash function, not the other kind). And the hash function is deterministic, so you can always rehash a putative password and see if the result is equal to a given hash value. Thus, a hashed password is sufficient to verify whether a given password is correct or not.
This still does not tell us why we would hash a password, only that hashing a password does not forfeit the intended usage of authenticating users.
To Hash or Not To Hash ?
Let’s see the following scenario: you have a Web site, with users who can “sign in” by showing their name and password. Once signed in, users gain “extra powers” such as reading and writing data. The server must then store “something” which can be used to verify user passwords. The most basic “something” consists in the password themselves. Presumably, the passwords would be stored in a SQL database, probably along with whatever data is used by the application.
The bad thing about such “cleartext” storage of passwords is that it induces a vulnerability in the case of an attack model where the attacker could get a read-only access to the server data. If that data includes the user passwords, then the villain could use these passwords to sign in as any user and get the corresponding powers, including any write access that valid users may have. This is an edge case (attacker can read the database but not write to it). However, this is a realistic edge case. Unwanted read access to parts of a Web server database is a common consequence of an SQL injection vulnerability. This really happens.
Also, the passwords themselves can be a prized spoil of war: users are human beings, they tend to reuse passwords across several systems. Or, on the more trivial aspect of things, many users choose as password the name of their girlfriend/boyfriend/dog. So knowing the password for a user on a given site has a tactical value which extends beyond that specific site, and even having an accidental look at the passwords can be embarrassing and awkward for the most honest system administrator.
Storing only hashed passwords solves these problems as best as one can hope for. It is unavoidable that a complete dump of the server data yields enough information to “try” passwords (that’s an “offline dictionary attack”) because the dump allows the attacker to “simulate” the complete server on his own machines, and try passwords at his leisure. We just want that the attacker may not have any faster strategy. A hash function is the right tool for that. In full details, the hashing process should include a per-password random salt (stored along the hashed value) and be appropriately slow (through thousands or millions of nested iterations), but that’s not the subject of this post. Just use bcrypt.
Summary: we hash passwords to prevent an attacker with read-only access from escalating to higher power levels. Password hashing will not make your Web site impervious to attacks; it will still be hacked. Password hashing is damage containment.
A drawback of password hashing is that since you do not store the passwords themselves (but only a piece of data which is sufficient to verify a password without being able to recover it), you cannot send back their passwords to users who have forgotten them. Instead, you must select a new random password for them, or let them choose a new password. Is that an issue ? One could say that since the user forgot his old password, then that password was not easy to remember, so changing it is not a bad thing after all. Users are accustomed to such a procedure. It may surprise them if you are able to send them back their password. Some of them might even frown upon your lack of security savviness if you so demonstrates that you do not hash the stored passwords.