Author Archive

QoTW #37: How does SSL work ?

2012-10-05 by Thomas Pornin. 0 comments

In this week’s question, we will talk about SSL. This question was asked by @Polynomial, who noticed that our site did not have yet a generic question on how SSL works. There were already some questions on the concept of SSL, but nothing really detailed.

Three answers were given, one by @Luc, and two by myself (because I got really verbose and there is a size limit on answers). The three answers concentrated on distinct aspects of SSL; together, they can explain why SSL works: SSL appears to be decently secure and we can see how this is achieved.

My first answer is a painfully long description of the detailed protocol as it appears on the wire. I wrote it as an introduction to the intricacies of the protocol; what information it contains must be known if you want to understand the details of the cryptographic attacks which have been tried on SSL. This is not much more than RFC-reading, but I still made an effort to merge four RFC (for the four protocol versions: SSLv3, TLS 1.0, 1.1 and 1.2) into one text which should be readable linearly. If you plan on implementing your own SSL client or server (a very instructive exercise, which I recommend for its pedagogical virtues), then I hope that this answer will be a useful reading guide for the actual standards.

What the protocol description shows is that at one point, during the initial steps of the SSL connection (the “handshake”), the server sends its “certificate” to the client (actually, a certificate chain), and then, a few steps later, the client appears to have gained some knowledge of the server’s public key, with which asymmetric cryptography is then performed. The SSL/TLS protocol handles these certificates as opaque blobs. What usually happens is that the client decodes the blobs as X.509 certificates and validates them with regards to a set of known trust anchors. The validation yields the server’s public key, with some guarantee that it really is the key owned by the intended server.

This certificate validation is the first foundation of SSL, as it is used for the Web (i.e. HTTPS). @Luc’s answer contains clear explanations on why certificates are used, and on what the guarantees rely on. Most enlightening is this excerpt:

You have to trust the CA not to make certificates as they please. When organizations like Microsoft, Apple and Mozilla trust a CA though, the CA must have audits; another organization checks on them periodically to make sure everything is still running according to the rules.

So the whole system relies on big companies checking on each other. Some of the trusted CA are governmental (from various governments) but the most often used are private business (e.g. Thawte, Verisign…). An important point to make is that it suffices to subvert or corrupt one CA to get a fake certificate which will be trusted worldwide; so this really is as robust as the weakest of the hundred or so trusted CA which browser vendors include by default. Nevertheless, it works quite well (attacks on CA are rare).

Note that since the certificate parts are quite isolated in the protocol itself (the certificates are just opaque blobs), SSL/TLS can be used without certificate validation in setups where the client “just knows” the server’s public key. This happens a lot in closed environments, such as embedded systems which talk to a mother server. Also, there are a few certificate-less cipher suites, such as the ones which use SRP.

This brings us to the second foundation of SSL: its intricate usage of cryptographic algorithms. Asymmetric encryption (RSA) or key exchange (Diffie-Hellman, or an elliptic curve variant), symmetric encryption with stream or block ciphers, hashing, message authentication codes (HMAC)… the whole paraphernalia is there. Assembling all these primitives into a coherent and secure protocol is not easy at all, and the history of SSL is a rather lengthy sequence of attacks and fixes. My second answer gives details on some of them. What must be remembered is that SSL is state of the art: every attack which has ever been conceived has been tried on SSL, because it is a high-value target. SSL got a lot of exposure, and its survival is testimony to its strength. Sure, it was occasionally harmed, but it was always salvaged. It is rather telling that the recent crop of attacks from Duong and Rizzo (ASP.NET padding oracles, BEAST, CRIME) are actually old attacks which Duong and Rizzo applied; their merit is not in inventing them (they didn’t) but in showing how practical they can be in a Web context (and masterfully did they show it).

From all of this we can list the reasons which make SSL work:

  • The binding between the alleged public key and the intended server is addressed. Granted, it is done with X.509 certificates, which have been designed by the Adversary to drive implementors crazy; but at least the problem is dealt with upfront.
  • The encryption system includes checked integrity, with a decent primitive (HMAC). The encryption uses CBC mode for block ciphers and the MAC is included in the MAC-then-encrypt way: both characteristics are suboptimal, and security with these choices requires special care in the specification (the need of random unpredictible IV, basis of the BEAST attack, fixed in TLS 1.1) and in the implementation (information leak through the padding, used in padding oracle attacks, fixed when Microsoft finally consented to notice the warning which was raised by Vaudenay in 2002).
  • All internal key expansion and checksum tasks are done with a specific function (called “the PRF”) which builds on standard primitives (HMAC with cryptographic hash functions).
  • The client and the server send random values, which are included in all PRF invocations, and protect against replay attacks.
  • The handshake ends with a couple of checksums, which are covered by the encryption+MAC layer, and the checksums are computed over all of the handshake messages (and this is important in defeating a lot of nasty things that an active attacker could otherwise do).
  • Algorithm agility. The cryptographic algorithms (cipher suites) and other features (protocol version, compression) are negotiated between the client and server, which allows for a smooth and gradual transition. This is how current browsers and servers can use AES encryption, which was defined in 2001, several years after SSLv3. It also facilitates recovery from attacks on some features, which can be deactivated on the client and/or the server (e.g. compression, which is used in CRIME).

All these characteristics contribute to the strength of SSL.

Liked this question of the week? Interested in reading it or adding an answer? See the question in full. Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.

Why passwords should be hashed

2011-11-01 by Thomas Pornin. 3 comments

How passwords should be hashed before storage or usage is a very common question, always triggering passionate debate. There is a simple and comprehensive answer (use bcrypt, but PBKDF2 is not bad either) which is not the end of the question since theoretically better solutions have been proposed and will be worth considering once they have withstood the test of time (i.e. “5 to 10 years in the field, and not broken yet”).

The less commonly asked question is:

why should a password be hashed?

This is what this post is about.

Encryption and Hashing

A first thing to note is that there are many people who talk about encrypted passwords but really mean hashed passwords. An encrypted password is like anything else which has been encrypted: it has been rendered unreadable through a process which used an extra piece of secret data (the key) and which can be reversed with knowledge of the same key (or of a distinct, mathematically related key, in the case of asymmetric encryption). For password hashing, on the other hand, there is no key. The hashing process is like a meat grinder: there is no key, everybody can operate it, but there is no way to get your cow back in full moo-ing state. Whereas encryption would be akin to locking the cow in a stable. Cryptographic hash functions are functions which anybody can compute, efficiently, over arbitrary inputs. They are deterministic (same input yields same output, for everybody).

In shorter words: if MD5 or SHA-1 is involved, this is password hashing, not password encryption. Let’s use the correct term.

Once hashed, the password is still quite useful, because even though the hashing process is not reversible, the output still contains the “essence” of the hashed password and two distinct passwords will yield, with very high probability (i.e. always, in practice), two distinct hashed values (that’s because we are talking about cryptographic hash function, not the other kind). And the hash function is deterministic, so you can always rehash a putative password and see if the result is equal to a given hash value. Thus, a hashed password is sufficient to verify whether a given password is correct or not.

This still does not tell us why we would hash a password, only that hashing a password does not forfeit the intended usage of authenticating users.

To Hash or Not To Hash ?

Let’s see the following scenario: you have a Web site, with users who can “sign in” by showing their name and password. Once signed in, users gain “extra powers” such as reading and writing data. The server must then store “something” which can be used to verify user passwords. The most basic “something” consists in the password themselves. Presumably, the passwords would be stored in a SQL database, probably along with whatever data is used by the application.

The bad thing about such “cleartext” storage of passwords is that it induces a vulnerability in the case of an attack model where the attacker could get a read-only access to the server data. If that data includes the user passwords, then the villain could use these passwords to sign in as any user and get the corresponding powers, including any write access that valid users may have. This is an edge case (attacker can read the database but not write to it). However, this is a realistic edge case. Unwanted read access to parts of a Web server database is a common consequence of an SQL injection vulnerability. This really happens.

Also, the passwords themselves can be a prized spoil of war: users are human beings, they tend to reuse passwords across several systems. Or, on the more trivial aspect of things, many users choose as password the name of their girlfriend/boyfriend/dog. So knowing the password for a user on a given site has a tactical value which extends beyond that specific site, and even having an accidental look at the passwords can be embarrassing and awkward for the most honest system administrator.

Storing only hashed passwords solves these problems as best as one can hope for. It is unavoidable that a complete dump of the server data yields enough information to “try” passwords (that’s an “offline dictionary attack”) because the dump allows the attacker to “simulate” the complete server on his own machines, and try passwords at his leisure. We just want that the attacker may not have any faster strategy. A hash function is the right tool for that. In full details, the hashing process should include a per-password random salt (stored along the hashed value) and be appropriately slow (through thousands or millions of nested iterations), but that’s not the subject of this post. Just use bcrypt.

Summary: we hash passwords to prevent an attacker with read-only access from escalating to higher power levels. Password hashing will not make your Web site impervious to attacks; it will still be hacked. Password hashing is damage containment.

A drawback of password hashing is that since you do not store the passwords themselves (but only a piece of data which is sufficient to verify a password without being able to recover it), you cannot send back their passwords to users who have forgotten them. Instead, you must select a new random password for them, or let them choose a new password. Is that an issue ? One could say that since the user forgot his old password, then that password was not easy to remember, so changing it is not a bad thing after all. Users are accustomed to such a procedure. It may surprise them if you are able to send them back their password. Some of them might even frown upon your lack of security savviness if you so demonstrates that you do not hash the stored passwords.

QotW #4: How can you reliably wipe data from a storage device?

2011-08-03 by Thomas Pornin. 1 comments

Today we investigate the problem of disposing of hardware storage devices (say, hard disks) which may contain sensitive data. The question which prompted this discussion “Is it enough to only wipe a flash drive once” is about Flash disks (SSD) and received some very good answers; here, we will try to look at the wider picture.

Confidentiality Issues and How to Avoid Them

Storage devices grow old; at some point we want to get rid of them, either because they broke down and need to be replaced, or because they became too small with regards to what we want to do with them. A storage device does not shrink over time, but our needs increase. Quite some time ago, I was sharing some disk space with about one hundred co-students, and the disk offered a hefty 120 megabytes. At that time, a colorful GIF picture was considered to be the top of technology. Today, we take for granted that we can store 2-hour long HD videos on a cellphone. The increase in disk space shows no sign of slowing down, so we have a steady stream of old disks (or USB sticks or SD cards or even ZIP/Jaz cartridges for the old timers — I will not delve into floppy disks) to dispose of. The problem is that all these pieces of hardware have been used quite liberally to store data, possibly confidential data. For the home user, think about the browser cookies, the saved passwords, the cryptographic private keys… In a business context, just about any data element could be of value for competitors.

This is a matter of confidentiality. There are two generic ways to deal with it: encryption, and destruction.

Disk Encryption

Disk encryption is about transforming data in a way which is reversible only with the knowledge of a given secret short-sized element called a key. We are talking about symmetric encryption here; the key is a simple sequence of, say, 128 bits chosen at random. The confidentiality issue is not totally obliterated, only severely reduced: supposedly, it is easier to deal with the secrecy of a sequence of 128 bits than that of 100 GB worth of data. Encryption can be done either by the disk itself, by the operating system, or by the application.

Application-based encryption is limited to what the application can control. For instance, it may have to fight a bit with the OS about virtual memory (that’s pure memory from the point of view of the application, but the OS is prone to write it to the disk anyway). Also, most applications do not have any encryption feature, and modifying all of them is out of the question (not even counting the closed-source ones, that’s just too much work).

OS disk encryption is more thorough, since it can be applied on the complete disk for all files from all applications. It has a few drawbacks, though:

  • The computer must still be able to boot up; in particular, to read from the disk the code which is used to decrypt data. So there must be at least an unencrypted area on the disk. This can cause some system administration headaches.
  • Performance may suffer, for an internal hard disk. An x86 Core2 CPU at 2.4 GHz can encrypt about 160 MB/s with AES (that’s what my PC does with the well-known OpenSSL crypto library): not only do some SSD go faster than that, I also have other things to do with my CPU (my OS is multitasking).
  • For external media (USB sticks, Flash cards…), there can be interoperability issues. There is no well-established standard on disk encryption. You could transport some appropriate software on the media itself but most places will not let you install applications as easily.

Encryption on the hardware itself is easier, but you do not really know if the drive does it properly. Also, the drive must keep the key in some way, and you want it to “forget” that key when the media is decommissioned. As Jesper’s answer describes, good encrypted disks keep the key in NVRAM (i.e. RAM with a battery) and can be instructed to forget the key, but this can prove difficult if the disk is broken: if it does not respond to commands anymore, you cannot really be sure that the NVRAM got blanked.

So while encryption is the theoretically appropriate way to go, it is not complete (you still have to manage the confidentiality of the key) and, in practice, it is not easy. Most of all, it works only if it is applied from day one: encryption can do nothing about data which was written before encryption was envisioned at all. So let’s see what can be done to destroy data.

Data Wiping

Wiping out data is a popular method; but popular does not necessarily mean efficient. The traditional wiping patterns rely on the idea that each data bit will be written exactly over the bit which was at the same logical emplacement on the disk: this was mostly true in 1996, but technology has evolved. Today’s hard disks do not have “track railings” to guide the read/write heads; instead, they use the data itself as guide. The net result is that the new data may be physically off the previous one by a small bit; the old data is still readable “on the edge”.

Also, modern hard disks do not have visibly bad sectors. Bad sectors still exist, but the disk transparently substitutes good sectors instead of bad sectors. This happens dynamically: when a disk writes some data on a sector and detects that the write operation did not go well, then it will allocate a new good sector from its spare area and do the write again there. From the point of view of the operating system, this is invisible; the only consequence is that the write took a few more milliseconds than could have been expected. However, the data has been written to the bad sector (admittedly, one or two bits of it may be wrong, but this leaves more than 4000 genuine bits) and since the sector is now marked as “bad”, it is forever inaccessible from the host computer. No amount of wiping can do anything about that.

On Flash, the same issue arises, multiplied a thousand times. Flash memory works by “blocks” (a few dozen kilobytes) in which two operations can be done:

  • changing a bit from value 1 to value 0;
  • erasing a whole block: all the bit blocks are set to 1.

A given block will endure only that many “erase” operations, so Flash devices use wear leveling techniques, in which write operations are scattered all about the place. The “Flash Translation Layer” is a standard wear leveling algorithm, designed to operate smoothly with a FAT filesystem. The wear leveling means that if you try to overwrite a file, the wiping pattern will be most of the time written elsewhere. Moreover, some blocks can be declared as “bad” and remapped to spare blocks, in a way similar to what magnetic hard disks do (only more often). So some data blocks just linger, forever unreachable from any software wiping.

The basic conclusion is that wiping does not work against a determined attacker. Simply overwriting the whole partition with zeros is enough to deter an attacker who will simply plug the disk in a computer: logically, a single write is enough to “remove” the data, and by working on the partition instead of files, you avoid any filesystem shenanigans. But if your enemies are so cheap that they will limit themselves to the logical layer, then you are lucky indeed.

@nealmcb’s question, How can I reliably erase all information on a hard drive? sparked some excellent discussion including NIST’s guidelines for Media Sanitisation.

Media Destruction

Without encryption (does not work for data which is already there) and data wiping (does not work reliably, or at all), the remaining solution is physical destruction. It is not that easy, though; a good sledgehammer swing, for instance, though satisfying, is not very effective towards data destruction. After all, this is a hard disk. To destroy its contents, you have to remove the cover (there are often many screws, some of which hidden, and glue, and rivets in some models) and then extract the platters, which are quite rigid disks. A simple office shredder will choke on those (although they would easily munch through older floppy disks). The magnetic layer is not thick, so mechanical abrasion may do the trick: use a sander or a grinder. Otherwise, dipping the platters in concentrated sulfuric acid should work.

For Flash devices, things are simpler: that’s mostly silicon with small bits of copper or aluminium, and a plastic cover. Just burn it.

Bottom-line: media destruction requires resources. In a business environment, this could be a system administrator task, but it will involve extra manpower, safety issues (seriously, a geeky system administrator with access to an acid cauldron or a furnace, isn’t it a bit scary ?) and possibly environmental considerations.

Conclusion

Data security must be thought throughout the complete life cycle of storage devices. Whether you go crypto or physical, you must put some thought and resources into it. Most people will simply store old disks and hope for the problem to get away on its own accord.