How Short URLs Are Built: From Hashes to Snowflake IDs
Short URLs are essential for many scenarios, and this article explores their components, including domain and path design, and dives into various path generation methods such as hash functions, auto‑increment IDs, distributed Snowflake algorithms, base‑62 encoding, and the trade‑offs between 301 and 302 redirects.
Background
In many scenarios short links are needed, especially when URLs are distributed via SMS or other channels. Long URLs with many path and query parameters can be unwieldy and may be truncated on platforms with length limits, making short links more user‑friendly.
Short Link Composition
A short link consists of protocol (often omitted), a domain (must be short), and a path. The path is typically a 6‑character string using upper‑case letters, lower‑case letters, and digits.
Path Generation
Get a Sequence Number
Hash Algorithm
One way to generate the path is to compute a hash. Common hash functions include MD5, SHA1, as well as non‑cryptographic hashes like HighwayHash and MurmurHash. MurmurHash3 can produce 32‑bit or 128‑bit hashes with good distribution for patterned keys.
Hash collisions are unavoidable; handling them adds system complexity.
Auto‑Increment ID
Another approach is to maintain an auto‑incrementing ID generator, mapping each long URL to a sequential number stored in a database, yielding short links such as https://fake.short/1, https://fake.short/2. To avoid a single point of failure, distributed ID generators are used.
Mysql Auto‑Increment
With 10 MySQL servers, each can start with a different offset (0‑9) and increment by 10, ensuring unique IDs across servers. Drawbacks include predictable IDs, difficulty scaling when adding servers, and high database load.
Snowflake Algorithm
Twitter's SnowFlake algorithm generates globally unique, roughly time‑ordered IDs in distributed systems.
The 64‑bit ID layout: 1 bit unused, 41 bits timestamp (millisecond precision, ~69 years), 10 bits machine ID (5 bits data‑center, 5 bits worker), and 12 bits sequence number (0‑4095 per millisecond). This allows up to 4,194,304 IDs per millisecond per node.
Open‑source Snowflake‑style generators include Baidu's UidGenerator and Meituan's Leaf.
Further Shortening
Converting a decimal sequence like 1536389934 to base‑62 yields a shorter string.
Base‑62 uses digits, lowercase and uppercase letters; base‑64 would introduce characters like / and + that need URL‑encoding.
Six base‑62 characters provide about 560 billion combinations.
Redirect: 301 or 302?
301 is a permanent redirect cached by browsers, reducing server load. However, caching prevents accurate tracking of short‑link clicks, which is valuable for analytics. Therefore, 302 (temporary redirect) is often preferred to allow logging, at the cost of slightly higher server pressure.
References
https://juejin.cn/post/6990275533057556494
https://tech.meituan.com/2017/04/21/mt-leaf.html
https://zhuanlan.zhihu.com/p/85837641
https://www.zhihu.com/question/29270034
https://www.zhihu.com/question/20103344
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
