Unveiling MD5: How It Works, Its Fixed Length, and Why It’s Vulnerable
This article explores the MD5 hashing algorithm in depth, detailing its fixed 128‑bit output, the padding and block processing steps defined in RFC 1321, the internal round functions, and the reasons it is considered irreversible yet vulnerable to collisions and various cracking techniques such as brute‑force, rainbow tables, and differential attacks.
Introduction
MD5 is a cryptographic hash function that produces a 128‑bit (32‑character hexadecimal) digest. It is often used to generate tokens, e.g.:
$token = md5(salt . code)Hypotheses
MD5 output length is fixed.
When expressed in hexadecimal, each of the 32 characters represents 4 bits, giving a total of 128 bits.
Principles
Processing the Original Message (Padding)
The message is padded so that its length (in bits) is congruent to 448 modulo 512. Padding starts with a single ‘1’ bit followed by ‘0’ bits as needed, then a 64‑bit representation of the original length is appended.
if (mod(len, 512) < 448) {
appendLen = 448 - mod(len, 512);
} else {
appendLen = 448 + 512 - mod(len, 512);
}Setting Initial Values
Four 32‑bit words are initialized (hexadecimal):
A = 0x01234567
B = 0x89abcdef
C = 0xfedcba98
D = 0x76543210Main Loop (64 Operations)
The padded message is divided into 512‑bit blocks, each split into sixteen 32‑bit words M[0…15]. For each block four rounds of 16 operations are performed using the non‑linear functions F, G, H, I and 64 constants K[i] = floor(2^64 × abs(sin(i+1))).
F(X,Y,Z) = (X & Y) | ((~X) & Z)
G(X,Y,Z) = (X & Z) | (Y & (~Z))
H(X,Y,Z) = X ^ Y ^ Z
I(X,Y,Z) = Y ^ (X | (~Z))Each operation updates the registers A, B, C, D with a left‑rotate by a round‑specific amount. Rotation amounts are:
[7,12,17,22] [5,9,14,20] [4,11,16,23] [6,10,15,21]Example of the first round (FF function):
FF(a,b,c,d,M0,7,0xd76aa478)
FF(d,a,b,c,M1,12,0xe8c7b756)
... (remaining 14 operations)Result Concatenation
After processing all blocks, the final values of A, B, C, D are concatenated (little‑endian) to produce the 128‑bit MD5 digest rendered as a 32‑character hexadecimal string.
Characteristics
Fixed 128‑bit (32‑hex‑char) output.
Irreversible: the compression functions lose information, making it infeasible to recover the original message.
Many‑to‑one mapping: different inputs can produce the same MD5 value (collisions).
Cracking Methods
Brute‑Force
Exhaustively enumerate all possible inputs until a matching hash is found. For a 6‑character alphanumeric string there are 62⁶ ≈ 5.68 × 10¹⁰ candidates, making the approach impractical for longer inputs.
Dictionary Attack
Pre‑compute a large table of possible inputs and their MD5 hashes, then look up the target hash. This method requires very high storage.
Hash List & Rainbow Tables
A hash list stores only the first and last elements of a chain of alternating hash (H) and reduction (R) functions, reducing storage compared to a full dictionary. Collisions in the reduction function can create gaps.
Rainbow tables improve on hash lists by using multiple reduction functions R₁…Rₖ, limiting collisions to within a single chain segment.
Differential Attack
Advanced cryptanalysis that searches for differential paths through the MD5 compression function. This technique has been used to construct collisions and break MD5 (and SHA‑1).
Visual Aids
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
