Designing Scalable Short‑URL Services: From Hashes to Base‑62 ID Encoding

This article examines short‑URL system design, comparing hash‑based, MySQL, and Redis storage strategies, and proposes an optimized solution using auto‑increment IDs with high‑base (62) encoding and lightweight encryption to reduce storage and improve lookup performance.

JavaEdge
JavaEdge
JavaEdge
Designing Scalable Short‑URL Services: From Hashes to Base‑62 ID Encoding

Part One: Short‑URL System Analysis

The core capability of a short‑URL service is to generate a short link from a long URL for external access, returning an existing short link if it already exists or creating and storing a new mapping otherwise. The reverse mapping (short‑to‑long) is also required to retrieve the original URL.

Part Two: Implementation Options

Hash‑Based Strategy

Using a hash function to create short links is straightforward, but it cannot be reversed, so the long‑to‑short mapping must be persisted. Because long URLs are often lengthy, a fixed‑size identifier (e.g., a 32‑bit MD5 hash) is generated to aid indexing. Hash collisions must be handled by appending a fixed suffix to the original URL, which also needs to be stripped during lookup.

Storage Design for Hash Strategy

MySQL (structured storage) can store records as:

id | short_url | long_url_md5 | long_url | timestamp

Indexes on short_url and long_url_md5 improve query speed. Advantages: clear schema and indexed queries. Drawbacks: performance concerns under high concurrency and the need for data expiration handling.

Redis (key‑value storage) requires multiple mappings:

long_url_md5 -> short_url
short_url -> long_url_md5
long_url_md5 -> long_url

Advantages: high read performance, built‑in expiration, and good scalability. Disadvantages: maintaining several KV pairs adds complexity.

Improved Approach – Auto‑Increment ID + High‑Base Encoding

Instead of storing a separate short‑to‑long mapping, the article proposes encoding a unique numeric identifier directly into the short link. A distributed ID (e.g., Snowflake) replaces the MD5 hash, and a 62‑character alphabet (0‑9, a‑z, A‑Z) provides compact base‑62 representation.

To prevent easy reverse‑engineering, an offset that varies with the character position is added before taking the modulo, and the same offset is subtracted during decoding.

Base‑62 conversion example
Base‑62 conversion example

Encoding Implementation (Base‑62 with Simple Encryption)

private static final String DIGITAL_STRING = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final byte[] DIGITAL;
static { DIGITAL = DIGITAL_STRING.getBytes(StandardCharsets.US_ASCII); }

public static String encode(long id) {
    long value = id;
    ByteBuffer buf = ByteBuffer.allocate(12);
    for (int i = 0; i < 12; i++) {
        int mod = (int) (value % 62);
        int pos = (mod + (OFFSET << i)) % 62; // add position‑dependent offset
        buf.put(DIGITAL[pos]);
        value = value / 62;
        if (value == 0 && i >= 6) break;
    }
    byte[] result = new byte[buf.position()];
    buf.rewind();
    buf.get(result);
    ArrayUtils.reverse(result);
    return new String(result);
}

public static long decode(String code) {
    long value = 0;
    byte[] buf = code.getBytes();
    int length = buf.length;
    for (int i = 0; i < length; i++) {
        int index = Arrays.binarySearch(DIGITAL, buf[i]);
        index = index - (OFFSET << (length - i - 1));
        index = index % 62;
        if (index < 0) index += 62;
        value = value * 62 + index;
    }
    return value;
}

The critical expression (mod + (OFFSET << i)) % 62 adds a position‑dependent offset to obscure the direct base‑62 conversion, making brute‑force decoding harder.

Part Three: Conclusion

Each technical option—hash, relational database, key‑value store, or ID‑based encoding—must be evaluated through experimentation and performance testing. Understanding the evolution of these designs and the trade‑offs at each bottleneck helps engineers choose the most suitable architecture for their short‑URL service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System DesignHashbase62short URLID encoding
JavaEdge
Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.