Unveiling MD5: How It Works, Its Fixed Length, and Why It’s Vulnerable

This article explores the MD5 hashing algorithm in depth, detailing its fixed 128‑bit output, the padding and block processing steps defined in RFC 1321, the internal round functions, and the reasons it is considered irreversible yet vulnerable to collisions and various cracking techniques such as brute‑force, rainbow tables, and differential attacks.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Unveiling MD5: How It Works, Its Fixed Length, and Why It’s Vulnerable

Introduction

MD5 is a cryptographic hash function that produces a 128‑bit (32‑character hexadecimal) digest. It is often used to generate tokens, e.g.:

$token = md5(salt . code)

Hypotheses

MD5 output length is fixed.

When expressed in hexadecimal, each of the 32 characters represents 4 bits, giving a total of 128 bits.

Principles

Processing the Original Message (Padding)

The message is padded so that its length (in bits) is congruent to 448 modulo 512. Padding starts with a single ‘1’ bit followed by ‘0’ bits as needed, then a 64‑bit representation of the original length is appended.

if (mod(len, 512) < 448) {
    appendLen = 448 - mod(len, 512);
} else {
    appendLen = 448 + 512 - mod(len, 512);
}

Setting Initial Values

Four 32‑bit words are initialized (hexadecimal):

A = 0x01234567
B = 0x89abcdef
C = 0xfedcba98
D = 0x76543210

Main Loop (64 Operations)

The padded message is divided into 512‑bit blocks, each split into sixteen 32‑bit words M[0…15]. For each block four rounds of 16 operations are performed using the non‑linear functions F, G, H, I and 64 constants K[i] = floor(2^64 × abs(sin(i+1))).

F(X,Y,Z) = (X & Y) | ((~X) & Z)
G(X,Y,Z) = (X & Z) | (Y & (~Z))
H(X,Y,Z) = X ^ Y ^ Z
I(X,Y,Z) = Y ^ (X | (~Z))

Each operation updates the registers A, B, C, D with a left‑rotate by a round‑specific amount. Rotation amounts are:

[7,12,17,22]  [5,9,14,20]  [4,11,16,23]  [6,10,15,21]

Example of the first round (FF function):

FF(a,b,c,d,M0,7,0xd76aa478)
FF(d,a,b,c,M1,12,0xe8c7b756)
... (remaining 14 operations)

Result Concatenation

After processing all blocks, the final values of A, B, C, D are concatenated (little‑endian) to produce the 128‑bit MD5 digest rendered as a 32‑character hexadecimal string.

Characteristics

Fixed 128‑bit (32‑hex‑char) output.

Irreversible: the compression functions lose information, making it infeasible to recover the original message.

Many‑to‑one mapping: different inputs can produce the same MD5 value (collisions).

Cracking Methods

Brute‑Force

Exhaustively enumerate all possible inputs until a matching hash is found. For a 6‑character alphanumeric string there are 62⁶ ≈ 5.68 × 10¹⁰ candidates, making the approach impractical for longer inputs.

Dictionary Attack

Pre‑compute a large table of possible inputs and their MD5 hashes, then look up the target hash. This method requires very high storage.

Hash List & Rainbow Tables

A hash list stores only the first and last elements of a chain of alternating hash (H) and reduction (R) functions, reducing storage compared to a full dictionary. Collisions in the reduction function can create gaps.

Rainbow tables improve on hash lists by using multiple reduction functions R₁…Rₖ, limiting collisions to within a single chain segment.

Differential Attack

Advanced cryptanalysis that searches for differential paths through the MD5 compression function. This technique has been used to construct collisions and break MD5 (and SHA‑1).

Visual Aids

MD5 processing diagram
MD5 processing diagram
MD5 padding illustration
MD5 padding illustration
MD5 main loop diagram
MD5 main loop diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

information securityMD5cryptographycollision attackhash algorithm
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.