Backend Development 16 min read

Design and Implementation of a Short URL Service: Algorithms, Storage, and Caching

This article explains the principles of short URL generation, compares incremental ID, hash, and random algorithms, analyzes their security and collision characteristics, and presents a complete backend design including database schema, caching strategy, sharding, and Java implementation details.

Top Architect
Top Architect
Top Architect
Design and Implementation of a Short URL Service: Algorithms, Storage, and Caching

Short URLs convert long web addresses into compact strings, a technique popularized by platforms with character limits such as Weibo.

The core functions of a short‑URL system are to generate a short code from a long URL and to redirect users from the short code to the original address.

Short‑code Generation Methods

Typical short codes consist of characters from [a‑z, A‑Z, 0‑9] , usually 6 characters long, providing 56.8 billion possible combinations.

Common generation approaches are:

Incremental ID – converts an auto‑incremented integer to base‑62.

Hash (MD5) – creates a fixed‑length signature and derives a 6‑character code.

Random – picks a random 6‑character string and checks for collisions.

Incremental ID

This collision‑free method uses the next auto‑increment value from a database table, converts it to base‑62, and optionally mixes it with other data (e.g., MD5) to obscure the sequence.

Hash (MD5) Algorithm

The hash method generates a 32‑character MD5 digest of key+url , splits it into four 8‑character segments, processes each segment to produce four 6‑character candidates, and selects one as the short code; collisions are rare but possible.

Random Number

A simple approach randomly selects six characters from the base‑62 set and repeats the process if the code already exists, but it suffers from higher collision probability, especially at large scale.

Algorithm Analysis

Incremental IDs are predictable and vulnerable to enumeration attacks; hash‑based codes are less predictable and generally preferred despite a minimal collision risk; random codes are the least reliable for high‑traffic services.

Implementation Details

Storage Scheme

The database stores the domain ( base_url ) and the path ( suffix_url ) separately, along with the full URL, generated short code ( shot_code ), expiration date, and click count. An example table diagram is provided in the original article.

Caching Scheme

For high‑traffic scenarios, only URLs accessed or created within the last three months are cached using an LRU policy; cache keys are the short codes, and a fallback to the database occurs on cache misses.

Sharding and Scaling

When data reaches hundreds of gigabytes (e.g., 1 billion records ≈ 953 GB), the table should be split. A common strategy is to route based on the numeric encoding of the short code, creating 20–100 shards to keep each shard under 50 GB.

Redirection Process

When a user accesses http://bit.ly/a3300 , DNS resolves the domain, the server looks up the short code in cache or DB, retrieves the long URL, and issues an HTTP 301 permanent redirect to the original address.

Java Code Example

The following Java class demonstrates the MD5‑based short‑code generation and a utility for converting numbers between decimal and base‑62:

import org.apache.commons.lang3.StringUtils;
import javax.xml.bind.DatatypeConverter;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.concurrent.atomic.AtomicLong;
import static com.alibaba.fastjson.util.IOUtils.DIGITS;
/**
 * @author rickiyang
 * @date 2020-01-07
 */
public class ShortUrlGenerator {
    public static void main(String[] args) {
        String sLongUrl = "http://www.baidu.com/121244/ddd";
        for (String shortUrl : shortUrl(sLongUrl)) {
            System.out.println(shortUrl);
        }
    }
    public static String[] shortUrl(String url) {
        String key = "dwz";
        String[] chars = new String[]{"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"};
        String sMD5EncryptResult = "";
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            md.update((key + url).getBytes());
            byte[] digest = md.digest();
            sMD5EncryptResult = DatatypeConverter.printHexBinary(digest).toUpperCase();
        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        }
        String[] resUrl = new String[4];
        for (int i = 0; i < 4; i++) {
            String sTempSubString = sMD5EncryptResult.substring(i * 8, i * 8 + 8);
            long lHexLong = 0x3FFFFFFF & Long.parseLong(sTempSubString, 16);
            String outChars = "";
            for (int j = 0; j < 6; j++) {
                long index = 0x0000003D & lHexLong;
                outChars += chars[(int) index];
                lHexLong = lHexLong >> 5;
            }
            resUrl[i] = outChars;
        }
        return resUrl;
    }
}

Additionally, the utility class NumericConvertUtils provides methods toOtherNumberSystem (decimal to any base up to 62) and toDecimalNumber (any base to decimal), which are useful for encoding and decoding short codes.

public class NumericConvertUtils {
    private static final char[] digits = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
    public static String toOtherNumberSystem(long number, int seed) {
        if (number < 0) {
            number = ((long)2 * 0x7fffffff) + number + 2;
        }
        char[] buf = new char[32];
        int charPos = 32;
        while ((number / seed) > 0) {
            buf[--charPos] = digits[(int) (number % seed)];
            number /= seed;
        }
        buf[--charPos] = digits[(int) (number % seed)];
        return new String(buf, charPos, (32 - charPos));
    }
    public static long toDecimalNumber(String number, int seed) {
        char[] charBuf = number.toCharArray();
        if (seed == 10) {
            return Long.parseLong(number);
        }
        long result = 0, base = 1;
        for (int i = charBuf.length - 1; i >= 0; i--) {
            int index = 0;
            for (int j = 0; j < digits.length; j++) {
                if (digits[j] == charBuf[i]) {
                    index = j;
                }
            }
            result += index * base;
            base *= seed;
        }
        return result;
    }
}
BackendJavaAlgorithmDatabasecachingShort URL
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.