Crypto
QoTW #45: Is my developer’s home-brew password security right or wrong, and why?
An incredibly popular question, viewed 17000 times in its first 3 weeks, this question has even led to a new Sec.SE meta meme.
In fact, our top meta meme explains why – the First Rule of Crypto is “Don’t Roll Your Own!”
So, with that in mind, Polynomial’s answer, delivered with a liberal dose of snark, explains in simple language:
This home-brew method offers no real resistance against brute force attacks, and gives a false impression of “complicated” security…Stick to tried and tested key derivation algorithms like PBKDF2 or bcrypt, which have undergone years of in-depth analysis and scrutiny from a wide range of professional and hobbyist cryptographers.
Konerak lists out some advantages of going with an existing public protocol:
- Probably written by smarter people than you
- Tested by a lot more people (probably some of them smarter than you)
- Reviewed by a lot more people (probably some of them smarter than you), often has mathematical proof
- Improved by a lot more people (probably some of them smarter than you)
- At the moment just one of those thousands of people finds a flaw, a lot of people start fixing it
KeithS also gives more detail:
- MD5 is completely broken
- SHA-1 is considered vulnerable
- More hashes don’t necessarily mean better hashing
- Passwords are inherently low-entropy
- This scheme is not adding any significant proof of work
Along with further answers, the discussion on this post covered a wide range of issues – well worth reading the whole thing!
Liked this question of the week? Interested in reading it or adding an answer? See the question in full. Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
QoTW #38: What is SHA-3 – and why did we change it?
Lucas Kauffman selected this week’s Question of the Week: What is SHA3 and why did we change it?
No doubt if you are at least a little bit curious about security, you’ll have heard of AES, the advanced encryption standard. Way back in 1997, when winters were really hard, our modems froze as we used them and Windows 98 had yet to appear, NIST saw the need to replace the then mainstream Data Encryption Standard with something resistant to the advances in cryptography that had occurred since its inception. So, NIST announced a competition and invited interested parties to submit algorithms matching the desired specification – the AES Process was underway.
A number of algorithms with varying designs were submitted for the process and three rounds held, with comments, cryptanalysis and feedback submitted at each stage. Between rounds, designers could tweak their algorithms if needed to address minor concerns; clearly broken algorithms did not progress. This was somewhat of a first for the crypto community – after the export restrictions and the so called “crypto wars” of the 90s, open cryptanalysis of published algorithms was novel, and it worked. We ended up with AES (and some of you may also have used Serpent, or Twofish) as a result.
Now, onto hashing. Way back in 1996, discussions were underway in the cryptographic community on the possibility of finding a collision within MD5. Practically MD5 started to be commonly exploited in 2005 to create fake certificate authorities. More recently, the FLAME malware used MD5 collisions to bypass Windows signature restrictions. Indeed, we covered this attack right here on the security blog.
The need for new hash functions has been known for some time, therefore. To replace MD5, SHA-1 was released. However, like its predecessor, cryptanalysis began to reveal that its collision resistance required a less-than-bruteforce search. Given that this eventually yields practical exploits that undermine cryptographic systems, a hash standard is needed that is resistant to finding collisions.
As of 2001, we have also had available to us SHA-2, a family of functions that as yet has survived cryptanalysis. However, SHA-2 is similar in design to its predecessor, SHA-1, and one might deduce that similar weaknesses may hold.
So, in response and in a similar vein to the AES process, NIST launched the SHA3 competition in 2007, in their words, in response to recent improvements in cryptanalysis of hash functions. Over the past few years, various algorithms have been analyzed and the number of candidates reduced, much like a reality TV show (perhaps without the tears, though). The final round algorithms essentially became the candidates for SHA3.
The big event this year is that Keccak has been announced as the SHA-3 hash standard. Before we go too much further, we should clarify some parts of the NIST process. Depending on the round an algorithm has reached determines the amount of cryptanalysis it will have received – the longer a function stays in the competition, the more analysis it faces. The report of round two candidates does not reveal any suggestion of breakage; however, NIST has selected its final round candidates based on a combination of performance factors and safety margins. Respected cryptographer Bruce Schneier even suggested that perhaps NIST should consider adopting several of the finalist functions as suitable.
That’s the background, so I am sure you are wondering: how does this affect me? Well, here’s what you should take into consideration:
- MD5 is broken. You should not use it; it has been used in practical exploits in the wild, if reports are to be believed – and even if they are not, there are alternatives.
- SHA-1 is shown to be theoretically weaker than expected. It is possible it may become practical to exploit it. As such, it would be prudent to migrate to a better hash function.
- In spite of concerns, the family of SHA-2 functions has thus far survived cryptanalysis. These are fine for current usage.
- Keccak and selected other SHA-3 finalists will likely become available in mainstream cryptographic libraries soon. SHA-3 is approved by NIST, so it is fine for current usage.
Liked this question of the week? Interested in reading it or adding an answer? See the question in full. Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
QoTW #37: How does SSL work ?
In this week’s question, we will talk about SSL. This question was asked by @Polynomial, who noticed that our site did not have yet a generic question on how SSL works. There were already some questions on the concept of SSL, but nothing really detailed.
Three answers were given, one by @Luc, and two by myself (because I got really verbose and there is a size limit on answers). The three answers concentrated on distinct aspects of SSL; together, they can explain why SSL works: SSL appears to be decently secure and we can see how this is achieved.
My first answer is a painfully long description of the detailed protocol as it appears on the wire. I wrote it as an introduction to the intricacies of the protocol; what information it contains must be known if you want to understand the details of the cryptographic attacks which have been tried on SSL. This is not much more than RFC-reading, but I still made an effort to merge four RFC (for the four protocol versions: SSLv3, TLS 1.0, 1.1 and 1.2) into one text which should be readable linearly. If you plan on implementing your own SSL client or server (a very instructive exercise, which I recommend for its pedagogical virtues), then I hope that this answer will be a useful reading guide for the actual standards.
What the protocol description shows is that at one point, during the initial steps of the SSL connection (the “handshake”), the server sends its “certificate” to the client (actually, a certificate chain), and then, a few steps later, the client appears to have gained some knowledge of the server’s public key, with which asymmetric cryptography is then performed. The SSL/TLS protocol handles these certificates as opaque blobs. What usually happens is that the client decodes the blobs as X.509 certificates and validates them with regards to a set of known trust anchors. The validation yields the server’s public key, with some guarantee that it really is the key owned by the intended server.
This certificate validation is the first foundation of SSL, as it is used for the Web (i.e. HTTPS). @Luc’s answer contains clear explanations on why certificates are used, and on what the guarantees rely on. Most enlightening is this excerpt:
You have to trust the CA not to make certificates as they please. When organizations like Microsoft, Apple and Mozilla trust a CA though, the CA must have audits; another organization checks on them periodically to make sure everything is still running according to the rules.
So the whole system relies on big companies checking on each other. Some of the trusted CA are governmental (from various governments) but the most often used are private business (e.g. Thawte, Verisign…). An important point to make is that it suffices to subvert or corrupt one CA to get a fake certificate which will be trusted worldwide; so this really is as robust as the weakest of the hundred or so trusted CA which browser vendors include by default. Nevertheless, it works quite well (attacks on CA are rare).
Note that since the certificate parts are quite isolated in the protocol itself (the certificates are just opaque blobs), SSL/TLS can be used without certificate validation in setups where the client “just knows” the server’s public key. This happens a lot in closed environments, such as embedded systems which talk to a mother server. Also, there are a few certificate-less cipher suites, such as the ones which use SRP.
This brings us to the second foundation of SSL: its intricate usage of cryptographic algorithms. Asymmetric encryption (RSA) or key exchange (Diffie-Hellman, or an elliptic curve variant), symmetric encryption with stream or block ciphers, hashing, message authentication codes (HMAC)… the whole paraphernalia is there. Assembling all these primitives into a coherent and secure protocol is not easy at all, and the history of SSL is a rather lengthy sequence of attacks and fixes. My second answer gives details on some of them. What must be remembered is that SSL is state of the art: every attack which has ever been conceived has been tried on SSL, because it is a high-value target. SSL got a lot of exposure, and its survival is testimony to its strength. Sure, it was occasionally harmed, but it was always salvaged. It is rather telling that the recent crop of attacks from Duong and Rizzo (ASP.NET padding oracles, BEAST, CRIME) are actually old attacks which Duong and Rizzo applied; their merit is not in inventing them (they didn’t) but in showing how practical they can be in a Web context (and masterfully did they show it).
From all of this we can list the reasons which make SSL work:
- The binding between the alleged public key and the intended server is addressed. Granted, it is done with X.509 certificates, which have been designed by the Adversary to drive implementors crazy; but at least the problem is dealt with upfront.
- The encryption system includes checked integrity, with a decent primitive (HMAC). The encryption uses CBC mode for block ciphers and the MAC is included in the MAC-then-encrypt way: both characteristics are suboptimal, and security with these choices requires special care in the specification (the need of random unpredictible IV, basis of the BEAST attack, fixed in TLS 1.1) and in the implementation (information leak through the padding, used in padding oracle attacks, fixed when Microsoft finally consented to notice the warning which was raised by Vaudenay in 2002).
- All internal key expansion and checksum tasks are done with a specific function (called “the PRF”) which builds on standard primitives (HMAC with cryptographic hash functions).
- The client and the server send random values, which are included in all PRF invocations, and protect against replay attacks.
- The handshake ends with a couple of checksums, which are covered by the encryption+MAC layer, and the checksums are computed over all of the handshake messages (and this is important in defeating a lot of nasty things that an active attacker could otherwise do).
- Algorithm agility. The cryptographic algorithms (cipher suites) and other features (protocol version, compression) are negotiated between the client and server, which allows for a smooth and gradual transition. This is how current browsers and servers can use AES encryption, which was defined in 2001, several years after SSLv3. It also facilitates recovery from attacks on some features, which can be deactivated on the client and/or the server (e.g. compression, which is used in CRIME).
All these characteristics contribute to the strength of SSL.
Liked this question of the week? Interested in reading it or adding an answer? See the question in full. Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
How can you protect yourself from CRIME, BEAST’s successor?
For those who haven’t been following Juliano Rizzo and Thai Duong, two researchers who developed the BEAST attack against TLS 1.0/SSL 3.0 in September 2011, they have developed another attack they plan to publish at the Ekoparty conference in Argentina later this month – this time giving them the ability to hijack HTTPS sessions – and this has started people worrying again.
Security Stack Exchange member Kyle Rozendo asked this question:
With the advent of CRIME, BEASTs successor, what is possible protection is available for an individual and / or system owner in order to protect themselves and their users against this new attack on TLS?
And the community expectation was that we wouldn’t get an answer until Rizzo and Duong presented their attack.
However, our highest reputation member, Thomas Pornin delivered this awesome hypothesis, which I will quote here verbatim:
This attack is supposed to be presented in 10 days from now, but my guess is that they use compression.
SSL/TLS optionally supports data compression. In the ClientHello message, the client states the list of compression algorithms that it knows of, and the server responds, in the ServerHello, with the compression algorithm that will be used. Compression algorithms are specified by one-byte identifiers, and TLS 1.2 (RFC 5246) defines only the null compression method (i.e. no compression at all). Other documents specify compression methods, in particular RFC 3749 which defines compression method 1, based on DEFLATE, the LZ77-derivative which is at the core of the GZip format and also modern Zip archives. When compression is used, it is applied on all the transferred data, as a long stream. In particular, when used with HTTPS, compression is applied on all the successive HTTP requests in the stream, header included. DEFLATE works by locating repeated subsequences of bytes.
Suppose that the attacker is some Javascript code which can send arbitrary requests to a target site (e.g. a bank) and runs on the attacked machine; the browser will send these requests with the user’s cookie for that bank — the cookie value that the attacker is after. Also, let’s suppose that the attacker can observe the traffic between the user’s machine and the bank (plausibly, the attacker has access to the same LAN of WiFi hotspot than the victim; or he has hijacked a router somewhere on the path, possibly close to the bank server).
For this example, we suppose that the cookie in each HTTP request looks like this:
> Cookie: secret=7xc89f+94/wa
The attacker knows the “Cookie: secret=” part and wishes to obtain the secret value. So he instructs his Javascript code to issue a request containing in the body the sequence “Cookie: secret=0″. The HTTP request will look like this:
POST / HTTP/1.1 Host: thebankserver.com (…) Cookie: secret=7xc89f+94/wa (…)
Cookie: secret=0
When DEFLATE sees that, it will recognize the repeated “Cookie: secret=” sequence and represent the second instance with a very short token (one which states “previous sequence has length 15 and was located n bytes in the past); DEFLATE will have to emit an extra token for the ’0′.
The request goes to the server. From the outside, the eavesdropping part of the attacker sees an opaque blob (SSL encrypts the data) but he can see the blob length (with byte granularity when the connection uses RC4; with block ciphers there is a bit of padding, but the attacker can adjust the contents of his requests so that he may phase with block boundaries, so, in practice, the attacker can know the length of the compressed request).
Now, the attacker tries again, with “Cookie: secret=1″ in the request body. Then, “Cookie: secret=2″, and so on. All these requests will compress to the same size (almost — there are subtleties with Huffman codes as used in DEFLATE), except the one which contains “Cookie: secret=7″, which compresses better (16 bytes of repeated subsequence instead of 15), and thus will be shorter. The attacker sees that. Therefore, in a few dozen requests, the attacker has guessed the first byte of the secret value.
He then just has to repeat the process (“Cookie: secret=70″, “Cookie: secret=71″, and so on) and obtain, byte by byte, the complete secret.
What I describe above is what I thought of when I read the article, which talks about “information leak” from an “optional feature”. I cannot know for sure that what will be published as the CRIME attack is really based upon compression. However, I do not see how the attack on compression cannot work. Therefore, regardless of whether CRIME turns out to abuse compression or be something completely different, you should turn off compression support from your client (or your server).
Note that I am talking about compression at the SSL level. HTTP also includes optional compression, but this one applies only to the body of the requests and responses, not the header, and thus does not cover the Cookie: header line. HTTP-level compression is fine.
(It is a shame to have to remove SSL compression, because it is very useful to lower bandwidth requirements, especially when a site contains many small pictures or is Ajax-heavy with many small requests, all beginning with extremely similar versions of a mammoth HTTP header. It would be better if the security model of Javascript was fixed to prevent malicious code from sending arbitrary requests to a bank server; I am not sure it is easy, though.)
As bobince commented:
I hope CRIME is this and we don’t have two vulns of this size in play! However, I wouldn’t say that being limited to entity bodies makes HTTP-level compression safe in general… whilst a cookie header is an obvious first choice of attack, there is potentially sensitive material in the body too. eg Imagine sniffing an anti-XSRF token from response body by causing the browser to send fields that get reflected in that response.
It is reassuring that there is a fix, and my recommendation would be for everyone to assess the risk to them of having sessions hijacked and seriously consider disabling SSL compression support.
QOTW #34 – iMessage – what security features are present?
Two weeks ago, a phishing vulnerability in the text messaging function of Apple’s iPhone was discovered by pod2g. The statement released by Apple said that iMessage, Apple’s proprietary instant messaging service was secure and suggested using it instead.
Apple takes security very seriously. When using iMessage instead of SMS, addresses are verified which protects against these kinds of spoofing attacks. One of the limitations of SMS is that it allows messages to be sent with spoofed addresses to any phone, so we urge customers to be extremely careful if they’re directed to an unknown website or address over SMS.
This prompted a rather large interest in the security community over how secure iMessage really is, given that the technology behind it is not public information. There have been several blog post and articles about that topic, including one right here on security.stackexchange – The inner workings of iMessage security?
The answer given by dr jimbob provided a few clues, sourced from several links.
It appears that the connection between the phone and Apple’s servers are encrypted with SSL/TLS (unclear which version) using a certificate self signed by Apple with a 2048 bit RSA key.
The iMessage service has been partially reverse engineered. More information can be found here and here.
My thoughts – Is iMessage truly more secure?
Compared to traditional SMS, yes. I do consider iMessage more secure. For one thing, it uses SSL/TLS to encrypt the connection between the phone and Apple’s servers. Compared to the A5/1 cipher used to encrypt SMS communications, this is much more secure.
However, iMessage still should not be used to send sensitive information. All data so far indicates that the messages are stored in plaintext in Apple’s servers. This presents several vulnerabilities. Apple or anyone able to compromise Apple’s servers would be able to read your messages – for as long as their cached.
Treat iMessage as you would emails or SMS communications. It is safe enough for daily usage, but highly sensitive information should not be sent through it.
References:
http://en.wikipedia.org/wiki/IMessage
http://www.pod2g.org/2012/08/never-trust-sms-ios-text-spoofing.html
http://www.engadget.com/2012/08/18/apple-responds-to-iphone-text-message-spoofing/
http://blog.cryptographyengineering.com/2012/08/dear-apple-please-set-imessage-free.html
http://imfreedom.org/wiki/IMessage
https://github.com/meeee/pushproxy
Liked this question of the week? Interested in reading it or adding an answer? See the question in full. Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
QotW #31: What cryptographic flaw was exploited by Flame, to get its code signed by Microsoft?
Community member D.W. nominated this week’s question: What cryptographic flaw was exploited by Flame to get its code signed by Microsoft?
Hendrik Brummerman provided an in depth answer which was subsequently confirmed by updates from Microsoft:
Certificate Purpose
There are multiple purposes a certificate may be used for. For example it may be used as a proof of identity of a person or webserver. It may be used for code sining or to sign other certificates.
In this case a certificate that was intended to sign license information was able to sign code.
It might be as simple as Microsoft not checking the purpose-flag of customer certificates they signed:
Specifically, when an enterprise customer requests a Terminal Services activation license, the certificate issued by Microsoft in response to the request allows code signing.MD5 collision attack
The reference to an old algorithm might indicate a collision attack on the signing process: There was a talk at CCC 2008 called MD5 considered harmful today – Creating a rogue CA Certificate In that talk the researches explained how to generate two certificates with the same hash. The generated a harmless looking certification request and submitted it to a CA. The CA signed it and generated the valid certificate for https-servers. But this certificate had the same hash as another generated certificate which had the purpose CA-certificate. So the CA signature of the harmless certificate was valid for the dangerous one as well. The researches exploited a weakness in MD5 to generate collisions. In order for the attack to work, they had to predict the information the CA would write into the certificate.
The combination of a collision attack and a misuse of the certificate purpose were both theoretical possibilities before this attack, but the researchers of the original md5 collision attack published that the attackers used a new variant of the known md5 chosen prefix attack.
Mark Hillick listed a few useful links, around the wider problems the antivirus industry has – being a very reactionary industry its effectiveness is reduced – and a related presentation by Moxie Marlinspike on authentication.
D.W. also provided some useful links for further reading, from Microsoft’s own Technet, and from arstechnica.
Makerofthings7‘s answer focused on reducing the surface area of public trust – in this instance, it wouldn’t have prevented the attack, as the cert was signed by Microsoft, but it would improve security in general.
Silvercore linked to an excellent blog post on the incident – well worth a read.
Liked this question of the week? Interested in reading it or adding an answer? See the question in full. Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
Tor: Exploiting the weakest link
Since the birth of the internet, there has been censorship. People have always been looking for ways to anonymously access the internet, either by proxy or VPN, however these still (can) log traffic origin and destination.
Since a few years there have been a few projects to anonymize traffic. One of the more famous ones is Tor (The Onion Router).
How Tor works
Tor uses servers and clients. When you request a webpage from your client, Tor will make an encrypted request to a randomly selected relay server called an Onion router. This Onion router knows who you are. Next thing the router does is ask another Onion router to relay the message. This second Onion router only knows the first Onion router. The second asks a third, the third asks the fourth, etc. No single router knows the complete route, however the client does.
The client can access a database which holds all the relays and if he wants, he can select his own route or a random route is selected. He then gets all the public keys for the route and encrypts his message in reverse order, starting with the public key of the last node, than the one to last node, etc. So the encryption is layered (just like the layers of an onion). However there is also a message for every node that contains the next hop. Now at the exit router the message is decrypted completely and the request for the webpage is made. For the webserver that serves the question, the client’s IP is the IP of the exit node.
The weakest link
So traffic is encrypted multiple times and relayed through different servers. This ensures anonymity. However… everyone can set up a Tor exit node … and everyone that has an exit node, can monitor the traffic.
The weakness in this technology is one we find in other technologies as well, the so called “user”.
A lot of people are concerned about their anonymity and figure they are safe when using Tor. They forget that when using a physical line or an encrypted Wifi AP, The chances of getting a Man in the Middle Attack (MMA) is small.
Now because we can easily host an exit node, we can sniff traffic from people who think they are anonymous, a lot of people in fact. At 20 Mbit (the max speed we allowed Tor to use), we got about 200 different Facebook sessions a day.
Users forget about certain things, like facebook over https. I’ve heard people say “I’ve enabled https on my facebook account, so when I log in, I’m safe.” Well that’s good for them but they forget that often, if you do not explicitly state https for the facebook login page, your password and username is sent PLAIN TEXT over the internet. Facebook doesn’t know you want a secure line before you are logged in.Obviously this goes up for a lot of different sites other than Facebook.
The whole point of Tor is to be anonymous, but users get facebook accounts with often their full name and address on it, and then log in insecurelly.
One could write a script (and we made a proof of concept), that looks for usernames and passwords or hijacks sessions and automatically goes to a facebook like page “I am using Tor to be anonymous”.
I am not saying Tor is unsafe, all we wanted to proof is that people need to think twice before thinking they are anonymous and safe on the internet. There will always be people that want to do malicious stuff. We could have hijacked about 20 accounts in half an hour and revealed people who use Tor or get into their emailboxes. (like Dan Egerstad also prooved in 2007).
Youtube Video
The comments in the clip are in Dutch, but basically we set up a tor node and used tshark to capture traffic. We specified we were interested in http traffic coming/going from Facebook. We then took the session cookie and injected it into our browser which then automatically logs us into Facebook as that user.
Conclusion
Tor is a good anonymity provider, but like all tools, you need to use it in the correct way.
QotW #20: Are Powerline ethernet adapters inherently secure?
ZM15 asked this interesting question just before Christmas over on Superuser. It came over to Security Stack Exchange for some security specific input and I was delighted to see it, as I have done a fair bit of work in the practical elements of securing communications – so this blog post may be a tad biased towards my experiences.
For those not in the know, Powerline ethernet is a technology which allows you to transmit ethernet over your existing mains wiring – which is very useful for buildings which aren’t suitable for running cabling, as all you need to do is pop one of these where you want to connect a computer or other ethernet enabled device and they will be able to route TCP/IP packets. There are some caveats of course, the signal really only works on a single phase, so if you have multiple phases in your house the signal may not travel from one to another, although as DBasnett commented, to get around this, commercial properties may inject the signal deliberately onto all phases.
Early Powerline adapters had very poor signal quality – noise on the mains caused many problems – but since then the technology has improved considerably, partly through increasing the signal strength, but also through improving the filters which allow you to separate signal from mains.
This is where the security problem lies – that signal can travel quite far down wires, and despite fuse boxes offering some resistance to signals, you can often find the signal is retrievable in the neighbour’s house. Damien answered:
I have experienced the signal bleed from my next door neighbor. I … could identify two other powerline adapters using the same network name. I got anywhere between 10 to 20Mbps of throughput between their adapters and mine. I was able to access their router, watch streaming video and see the computers on the network. I also noticed they had gotten IPs on my router also.
This prompted him to enable security.
Tylerl gave an excellent viewpoint, which is as accurate here as it has ever been:
Many of the more expensive network security disasters in IT have come from the assumption that “behind the firewall” everything is safe.
Here the assumption was that the perimeter of the house is a barrier, but it really isn’t.
Along even weirder lines, as is the way with any electrical signal, it will be transmitted to some degree from every wire that carries it, so if you have the right equipment you may be able to pick up the traffic from a vehicle parked on the street. This has long been an issue for organisations dealing in highly sensitive information, so various techniques have been developed to shield against these transmissions, however you are unlikely to have a Faraday cage built into your house. (See the article on TEMPEST over on Wikipedia or this 1972 NSA document for more information)
For similar wireless eavesdropping, read about keyboards, securing physical locations, this answer from Tom Leek and this one from Rook - all pointing out that to a determined attacker, there is not a lot the average person can do to protect themselves.
Scared yet?
Well, unless you have attackers specifically targeting you, you shouldn’t be, as it is very straightforward to enable security that would be appropriate for most individuals, at least for the foreseeable future. TEMPEST shielding should not be necessary and if you do run Powerline ethernet:
Most Powerline adapters have a security option – simply encryption using a shared key. It adds a little overhead to each communication, but as you can now get 1Gb adapters, this shouldn’t affect most of us. If you need >1Gb, get your property wired.
Liked this question of the week? Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
QotW #19: Why can’t a password hash be reverse engineered?
Are you a systems administrator of professional computer systems? Well, serverfault is where you want to be and that’s where this week’s question of the week came from.
New user mucker wanted to understand why, if hashing is just an algorithm, it cannot simply be reverse engineered. A fair question and security.SE as usual did not disappoint.
Since I’m a moderator on crypto.se this question is a perfect fit to write up, so much so I’m going to take a slight detour and define some terms for you and a little on how hashes work.
A background on the internals of hash functions
First, an analogy for hash functions. A hash function works one block of data at a time – so when you hash your large file, the hash function takes so many blocks (depending on the algorithm) at once. It has an initial state – i.e. configuration – which is why sha of nothing actually has a value. Then, each set of incoming blocks alter those values. (Side note: collision resistance is achieved like this).
The analogy in this case is like a bike lock with twisty bits on. Imagine the default state is “1234″ and every time you get a number, you alter each of the digits according to the input. When you’ve processed all of the incoming blocks, you then read the number you have in front of you. Hash functions work in a similar way – the state is an array and individual parts of it are shifted, xor’d etc depending on each incoming block. See the linked articles above for more.
Then, we can define input and output of two things: one instance of the hash function has inputs and outputs, as does the overall process of passing all your data through the hash function.
The answers
The top answer from Dietrich Epp is excellent – a simple example was provided of a function – in this case multiplication – which one can do easily forwards (O(N^2)) but that becomes difficult backwards. Factoring large numbers, especially ones with large prime factors, is a famous “hard problem”. Hash functions rely on exactly this property: it is not that they cannot be inverted, it is just that they are hard.
Before migration, Serverfault user Coredump also provided a similar explanation. Some interesting debate came up in the comments of this answer – user nealmcb observed that actually collisions are available in abundance. To go back to the mathsy stuff – the number of inputs is every possible piece of data there is, whereas for outputs we only have 256 bits of data. So, there are many really long passwords that map to each valid hash value, but that still doesn’t help you find them.
Neal then answered the question himself to raise some further important issues – from a security perspective, it is important to not think of hashes as “impossible” to reverse. At best they are “hard”, and that is true only if the hash is expertly designed. As Neal alludes to, breaking hashes often involves significant computing power and dictionary attacks, and might be considered, to steal his words, “messy” (as opposed to a pretty closed-form inverted function) but it can be done. And all-too-often, it is not even “hard”, as we see with both the famously bad LanMan hash that the original poster mentioned, and the original MYSQL hash.
Several other answers also provided excellent explanations – one to note from Mikeazo that in practise, hash functions are many to one as a result of the fact there are infinite possible inputs, but a fixed number of outputs (hash strings). Luckily for us, a well designed hash function has a large enough output space that collisions aren’t a problem.
So hashes can be inverted?
As a final point on hash functions I’m going to briefly link to this question about the general justification for the security of block ciphers and hash functions. The answer is that even for the best common hashes, no, there is no guarantee of the hardness of reversing them – just as there is no cast iron guarantee products of large primes cannot be factored.
Liked this question of the week? Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.
Why passwords should be hashed
How passwords should be hashed before storage or usage is a very common question, always triggering passionate debate. There is a simple and comprehensive answer (use bcrypt, but PBKDF2 is not bad either) which is not the end of the question since theoretically better solutions have been proposed and will be worth considering once they have withstood the test of time (i.e. “5 to 10 years in the field, and not broken yet”).
The less commonly asked question is:
why should a password be hashed?
This is what this post is about.
Encryption and Hashing
A first thing to note is that there are many people who talk about encrypted passwords but really mean hashed passwords. An encrypted password is like anything else which has been encrypted: it has been rendered unreadable through a process which used an extra piece of secret data (the key) and which can be reversed with knowledge of the same key (or of a distinct, mathematically related key, in the case of asymmetric encryption). For password hashing, on the other hand, there is no key. The hashing process is like a meat grinder: there is no key, everybody can operate it, but there is no way to get your cow back in full moo-ing state. Whereas encryption would be akin to locking the cow in a stable. Cryptographic hash functions are functions which anybody can compute, efficiently, over arbitrary inputs. They are deterministic (same input yields same output, for everybody).
In shorter words: if MD5 or SHA-1 is involved, this is password hashing, not password encryption. Let’s use the correct term.
Once hashed, the password is still quite useful, because even though the hashing process is not reversible, the output still contains the “essence” of the hashed password and two distinct passwords will yield, with very high probability (i.e. always, in practice), two distinct hashed values (that’s because we are talking about cryptographic hash function, not the other kind). And the hash function is deterministic, so you can always rehash a putative password and see if the result is equal to a given hash value. Thus, a hashed password is sufficient to verify whether a given password is correct or not.
This still does not tell us why we would hash a password, only that hashing a password does not forfeit the intended usage of authenticating users.
To Hash or Not To Hash ?
Let’s see the following scenario: you have a Web site, with users who can “sign in” by showing their name and password. Once signed in, users gain “extra powers” such as reading and writing data. The server must then store “something” which can be used to verify user passwords. The most basic “something” consists in the password themselves. Presumably, the passwords would be stored in a SQL database, probably along with whatever data is used by the application.
The bad thing about such “cleartext” storage of passwords is that it induces a vulnerability in the case of an attack model where the attacker could get a read-only access to the server data. If that data includes the user passwords, then the villain could use these passwords to sign in as any user and get the corresponding powers, including any write access that valid users may have. This is an edge case (attacker can read the database but not write to it). However, this is a realistic edge case. Unwanted read access to parts of a Web server database is a common consequence of an SQL injection vulnerability. This really happens.
Also, the passwords themselves can be a prized spoil of war: users are human beings, they tend to reuse passwords across several systems. Or, on the more trivial aspect of things, many users choose as password the name of their girlfriend/boyfriend/dog. So knowing the password for a user on a given site has a tactical value which extends beyond that specific site, and even having an accidental look at the passwords can be embarrassing and awkward for the most honest system administrator.
Storing only hashed passwords solves these problems as best as one can hope for. It is unavoidable that a complete dump of the server data yields enough information to “try” passwords (that’s an “offline dictionary attack”) because the dump allows the attacker to “simulate” the complete server on his own machines, and try passwords at his leisure. We just want that the attacker may not have any faster strategy. A hash function is the right tool for that. In full details, the hashing process should include a per-password random salt (stored along the hashed value) and be appropriately slow (through thousands or millions of nested iterations), but that’s not the subject of this post. Just use bcrypt.
Summary: we hash passwords to prevent an attacker with read-only access from escalating to higher power levels. Password hashing will not make your Web site impervious to attacks; it will still be hacked. Password hashing is damage containment.
A drawback of password hashing is that since you do not store the passwords themselves (but only a piece of data which is sufficient to verify a password without being able to recover it), you cannot send back their passwords to users who have forgotten them. Instead, you must select a new random password for them, or let them choose a new password. Is that an issue ? One could say that since the user forgot his old password, then that password was not easy to remember, so changing it is not a bad thing after all. Users are accustomed to such a procedure. It may surprise them if you are able to send them back their password. Some of them might even frown upon your lack of security savviness if you so demonstrates that you do not hash the stored passwords.

Subscribe via RSS