Designing Scalable Short‑URL Services: From Hashes to Base‑62 ID Encoding
This article examines short‑URL system design, comparing hash‑based, MySQL, and Redis storage strategies, and proposes an optimized solution using auto‑increment IDs with high‑base (62) encoding and lightweight encryption to reduce storage and improve lookup performance.
Part One: Short‑URL System Analysis
The core capability of a short‑URL service is to generate a short link from a long URL for external access, returning an existing short link if it already exists or creating and storing a new mapping otherwise. The reverse mapping (short‑to‑long) is also required to retrieve the original URL.
Part Two: Implementation Options
Hash‑Based Strategy
Using a hash function to create short links is straightforward, but it cannot be reversed, so the long‑to‑short mapping must be persisted. Because long URLs are often lengthy, a fixed‑size identifier (e.g., a 32‑bit MD5 hash) is generated to aid indexing. Hash collisions must be handled by appending a fixed suffix to the original URL, which also needs to be stripped during lookup.
Storage Design for Hash Strategy
MySQL (structured storage) can store records as:
id | short_url | long_url_md5 | long_url | timestampIndexes on short_url and long_url_md5 improve query speed. Advantages: clear schema and indexed queries. Drawbacks: performance concerns under high concurrency and the need for data expiration handling.
Redis (key‑value storage) requires multiple mappings:
long_url_md5 -> short_url
short_url -> long_url_md5
long_url_md5 -> long_urlAdvantages: high read performance, built‑in expiration, and good scalability. Disadvantages: maintaining several KV pairs adds complexity.
Improved Approach – Auto‑Increment ID + High‑Base Encoding
Instead of storing a separate short‑to‑long mapping, the article proposes encoding a unique numeric identifier directly into the short link. A distributed ID (e.g., Snowflake) replaces the MD5 hash, and a 62‑character alphabet (0‑9, a‑z, A‑Z) provides compact base‑62 representation.
To prevent easy reverse‑engineering, an offset that varies with the character position is added before taking the modulo, and the same offset is subtracted during decoding.
Encoding Implementation (Base‑62 with Simple Encryption)
private static final String DIGITAL_STRING = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final byte[] DIGITAL;
static { DIGITAL = DIGITAL_STRING.getBytes(StandardCharsets.US_ASCII); }
public static String encode(long id) {
long value = id;
ByteBuffer buf = ByteBuffer.allocate(12);
for (int i = 0; i < 12; i++) {
int mod = (int) (value % 62);
int pos = (mod + (OFFSET << i)) % 62; // add position‑dependent offset
buf.put(DIGITAL[pos]);
value = value / 62;
if (value == 0 && i >= 6) break;
}
byte[] result = new byte[buf.position()];
buf.rewind();
buf.get(result);
ArrayUtils.reverse(result);
return new String(result);
}
public static long decode(String code) {
long value = 0;
byte[] buf = code.getBytes();
int length = buf.length;
for (int i = 0; i < length; i++) {
int index = Arrays.binarySearch(DIGITAL, buf[i]);
index = index - (OFFSET << (length - i - 1));
index = index % 62;
if (index < 0) index += 62;
value = value * 62 + index;
}
return value;
}The critical expression (mod + (OFFSET << i)) % 62 adds a position‑dependent offset to obscure the direct base‑62 conversion, making brute‑force decoding harder.
Part Three: Conclusion
Each technical option—hash, relational database, key‑value store, or ID‑based encoding—must be evaluated through experimentation and performance testing. Understanding the evolution of these designs and the trade‑offs at each bottleneck helps engineers choose the most suitable architecture for their short‑URL service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
