How Short URLs Are Built: From Hashes to Snowflake IDs

Short URLs are essential for many scenarios, and this article explores their components, including domain and path design, and dives into various path generation methods such as hash functions, auto‑increment IDs, distributed Snowflake algorithms, base‑62 encoding, and the trade‑offs between 301 and 302 redirects.

ELab Team
ELab Team
ELab Team
How Short URLs Are Built: From Hashes to Snowflake IDs

Background

In many scenarios short links are needed, especially when URLs are distributed via SMS or other channels. Long URLs with many path and query parameters can be unwieldy and may be truncated on platforms with length limits, making short links more user‑friendly.

Short Link Composition

A short link consists of protocol (often omitted), a domain (must be short), and a path. The path is typically a 6‑character string using upper‑case letters, lower‑case letters, and digits.

Path Generation

Get a Sequence Number

Hash Algorithm

One way to generate the path is to compute a hash. Common hash functions include MD5, SHA1, as well as non‑cryptographic hashes like HighwayHash and MurmurHash. MurmurHash3 can produce 32‑bit or 128‑bit hashes with good distribution for patterned keys.

Hash collisions are unavoidable; handling them adds system complexity.

Auto‑Increment ID

Another approach is to maintain an auto‑incrementing ID generator, mapping each long URL to a sequential number stored in a database, yielding short links such as https://fake.short/1, https://fake.short/2. To avoid a single point of failure, distributed ID generators are used.

Mysql Auto‑Increment

With 10 MySQL servers, each can start with a different offset (0‑9) and increment by 10, ensuring unique IDs across servers. Drawbacks include predictable IDs, difficulty scaling when adding servers, and high database load.

Snowflake Algorithm

Twitter's SnowFlake algorithm generates globally unique, roughly time‑ordered IDs in distributed systems.

The 64‑bit ID layout: 1 bit unused, 41 bits timestamp (millisecond precision, ~69 years), 10 bits machine ID (5 bits data‑center, 5 bits worker), and 12 bits sequence number (0‑4095 per millisecond). This allows up to 4,194,304 IDs per millisecond per node.

Open‑source Snowflake‑style generators include Baidu's UidGenerator and Meituan's Leaf.

Further Shortening

Converting a decimal sequence like 1536389934 to base‑62 yields a shorter string.

Base‑62 uses digits, lowercase and uppercase letters; base‑64 would introduce characters like / and + that need URL‑encoding.

Six base‑62 characters provide about 560 billion combinations.

Redirect: 301 or 302?

301 is a permanent redirect cached by browsers, reducing server load. However, caching prevents accurate tracking of short‑link clicks, which is valuable for analytics. Therefore, 302 (temporary redirect) is often preferred to allow logging, at the cost of slightly higher server pressure.

References

https://juejin.cn/post/6990275533057556494

https://tech.meituan.com/2017/04/21/mt-leaf.html

https://zhuanlan.zhihu.com/p/85837641

https://www.zhihu.com/question/29270034

https://www.zhihu.com/question/20103344

backendSnowflakeID generationURL shortenerRedirectbase62
ELab Team
Written by

ELab Team

Sharing fresh technical insights

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.