Backend Development 17 min read

Design and Implementation of a High‑Performance Short URL Service

This article explains the motivation, architecture, resource estimation, code generation algorithms, data modeling, database partitioning, caching strategies, and implementation details for building a scalable short URL system that handles billions of links and high QPS workloads.

Rare Earth Juejin Tech Community

May 21, 2024

Design and Implementation of a High‑Performance Short URL Service

Before diving into the implementation, the article outlines the business need for shortening long URLs generated by marketing platforms, which often exceed input limits and appear unsafe, prompting the adoption of short URLs.

It compares a long URL example with a short URL and presents a system use‑case diagram illustrating how users create short URLs, how the service stores mappings, and how redirects are performed.

The article defines key terms such as short URL, 301 and 302 redirects, and introduces the short‑link and reach models that record when, who, and where URLs are generated and accessed.

Resource estimation is discussed, including projected daily generation of 3 million short URLs, a two‑year retention leading to over 2.1 billion records, storage requirements (~1.2 TB), CODE length choices (6‑character base62 yielding 56.8 billion possibilities), and QPS calculations (average ~9,988 qps, peak ~14,982 qps).

Bandwidth usage is estimated at roughly 1 KB per request, resulting in average bandwidth of 9.5 MB/s and peak of 14.2 MB/s.

Redis caching strategy is proposed to handle 90 % of traffic, with a cache size of about 26 million entries (~19.3 GB) for a 7‑day hot window.

CODE generation algorithms are detailed: a base62 table, a snowflake‑based ID modulo approach, and an MD5‑then‑base64 method, each producing a 6‑character CODE. The article provides the full Go implementation wrapped in

import (
    "crypto/md5"
    "encoding/base64"
    "github.com/bwmarrin/snowflake"
    "io"
    "strings"
)

type Shortener interface {
    Generate(url string) (string, error)
}

var base64Factory = []string{"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
    "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",
    "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
    "v", "w", "x", "y", "z", "A",
    "B", "C", "E", "F", "D", "G", "H", "I", "J", "K", "L", "M",
    "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "-", "_",
}

const (
    size = 6
)

type snowflakeGenerate struct {}

func (s *snowflakeGenerate) Generate(url string) (string, error) {
    node, err := snowflake.NewNode(1)
    if err != nil {
        return "", err
    }
    id := node.Generate().Int64()
    l := int64(len(base64Factory))
    sb := strings.Builder{}
    for ; id/l > 0; id /= l {
        index := id % l
        sb.WriteString(base64Factory[index])
    }
    return sb.String()[:size], nil
}

type md5Generate struct {}

func (s *md5Generate) Generate(url string) (string, error) {
    hash := md5.New()
    _, err := io.WriteString(hash, url)
    if err != nil {
        return "", err
    }
    b := hash.Sum(nil)
    es := base64.RawURLEncoding.EncodeToString(b)
    return es[:size], nil
}

and test cases wrapped in func TestSnowflakeGenerate(t *testing.T) { ... }.

Collision handling is addressed by recommending unique indexes on CODE, pre‑generation of codes, and fallback strategies when a generated CODE already exists.

The data model is presented with a SQL DDL statement for the short_link table, including fields for user ID, device, IP, original URL, MD5, CODE, timestamps, and a hash partition key, partitioned into 105 shards.

Database selection favors TiDB with hash partitioning to evenly distribute the massive dataset, and the article discusses TTL‑based automatic expiration versus scheduled cleanup jobs.

Finally, the article concludes that despite the large data volume and high QPS, the system architecture remains straightforward, emphasizing correct CODE‑URL mapping, optional Redis caching, and multi‑pod high availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

code generation backend-architecture scalability golang Redis database partitioning short-url

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.