Are you a systems administrator of professional computer systems? Well, serverfault is where you want to be and that’s where this week’s question of the week came from.
New user mucker wanted to understand why, if hashing is just an algorithm, it cannot simply be reverse engineered. A fair question and security.SE as usual did not disappoint.
Since I’m a moderator on crypto.se this question is a perfect fit to write up, so much so I’m going to take a slight detour and define some terms for you and a little on how hashes work.
A background on the internals of hash functions
First, an analogy for hash functions. A hash function works one block of data at a time – so when you hash your large file, the hash function takes so many blocks (depending on the algorithm) at once. It has an initial state – i.e. configuration – which is why sha of nothing actually has a value. Then, each set of incoming blocks alter those values. (Side note: collision resistance is achieved like this).
The analogy in this case is like a bike lock with twisty bits on. Imagine the default state is “1234″ and every time you get a number, you alter each of the digits according to the input. When you’ve processed all of the incoming blocks, you then read the number you have in front of you. Hash functions work in a similar way – the state is an array and individual parts of it are shifted, xor’d etc depending on each incoming block. See the linked articles above for more.
Then, we can define input and output of two things: one instance of the hash function has inputs and outputs, as does the overall process of passing all your data through the hash function.
The top answer from Dietrich Epp is excellent – a simple example was provided of a function – in this case multiplication – which one can do easily forwards (O(N^2)) but that becomes difficult backwards. Factoring large numbers, especially ones with large prime factors, is a famous “hard problem”. Hash functions rely on exactly this property: it is not that they cannot be inverted, it is just that they are hard.
Before migration, Serverfault user Coredump also provided a similar explanation. Some interesting debate came up in the comments of this answer – user nealmcb observed that actually collisions are available in abundance. To go back to the mathsy stuff – the number of inputs is every possible piece of data there is, whereas for outputs we only have 256 bits of data. So, there are many really long passwords that map to each valid hash value, but that still doesn’t help you find them.
Neal then answered the question himself to raise some further important issues – from a security perspective, it is important to not think of hashes as “impossible” to reverse. At best they are “hard”, and that is true only if the hash is expertly designed. As Neal alludes to, breaking hashes often involves significant computing power and dictionary attacks, and might be considered, to steal his words, “messy” (as opposed to a pretty closed-form inverted function) but it can be done. And all-too-often, it is not even “hard”, as we see with both the famously bad LanMan hash that the original poster mentioned, and the original MYSQL hash.
Several other answers also provided excellent explanations – one to note from Mikeazo that in practise, hash functions are many to one as a result of the fact there are infinite possible inputs, but a fixed number of outputs (hash strings). Luckily for us, a well designed hash function has a large enough output space that collisions aren’t a problem.
So hashes can be inverted?
As a final point on hash functions I’m going to briefly link to this question about the general justification for the security of block ciphers and hash functions. The answer is that even for the best common hashes, no, there is no guarantee of the hardness of reversing them – just as there is no cast iron guarantee products of large primes cannot be factored.
Liked this question of the week? Have questions of a security nature of your own? Security expert and want to help others? Come and join us at security.stackexchange.com.