Design and Implementation of a High‑Performance Short URL Service
This article explains the motivation, architecture, resource estimation, code generation algorithms, data modeling, database partitioning, caching strategies, and implementation details for building a scalable short URL system that handles billions of links and high QPS workloads.
Before diving into the implementation, the article outlines the business need for shortening long URLs generated by marketing platforms, which often exceed input limits and appear unsafe, prompting the adoption of short URLs.
It compares a long URL example with a short URL and presents a system use‑case diagram illustrating how users create short URLs, how the service stores mappings, and how redirects are performed.
The article defines key terms such as short URL, 301 and 302 redirects, and introduces the short‑link and reach models that record when, who, and where URLs are generated and accessed.
Resource estimation is discussed, including projected daily generation of 3 million short URLs, a two‑year retention leading to over 2.1 billion records, storage requirements (~1.2 TB), CODE length choices (6‑character base62 yielding 56.8 billion possibilities), and QPS calculations (average ~9,988 qps, peak ~14,982 qps).
Bandwidth usage is estimated at roughly 1 KB per request, resulting in average bandwidth of 9.5 MB/s and peak of 14.2 MB/s.
Redis caching strategy is proposed to handle 90 % of traffic, with a cache size of about 26 million entries (~19.3 GB) for a 7‑day hot window.
CODE generation algorithms are detailed: a base62 table, a snowflake‑based ID modulo approach, and an MD5‑then‑base64 method, each producing a 6‑character CODE. The article provides the full Go implementation wrapped in import ( "crypto/md5" "encoding/base64" "github.com/bwmarrin/snowflake" "io" "strings" ) type Shortener interface { Generate(url string) (string, error) } var base64Factory = []string{"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "E", "F", "D", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "-", "_", } const ( size = 6 ) type snowflakeGenerate struct {} func (s *snowflakeGenerate) Generate(url string) (string, error) { node, err := snowflake.NewNode(1) if err != nil { return "", err } id := node.Generate().Int64() l := int64(len(base64Factory)) sb := strings.Builder{} for ; id/l > 0; id /= l { index := id % l sb.WriteString(base64Factory[index]) } return sb.String()[:size], nil } type md5Generate struct {} func (s *md5Generate) Generate(url string) (string, error) { hash := md5.New() _, err := io.WriteString(hash, url) if err != nil { return "", err } b := hash.Sum(nil) es := base64.RawURLEncoding.EncodeToString(b) return es[:size], nil } and test cases wrapped in func TestSnowflakeGenerate(t *testing.T) { ... } .
Collision handling is addressed by recommending unique indexes on CODE, pre‑generation of codes, and fallback strategies when a generated CODE already exists.
The data model is presented with a SQL DDL statement for the short_link table, including fields for user ID, device, IP, original URL, MD5, CODE, timestamps, and a hash partition key, partitioned into 105 shards.
Database selection favors TiDB with hash partitioning to evenly distribute the massive dataset, and the article discusses TTL‑based automatic expiration versus scheduled cleanup jobs.
Finally, the article concludes that despite the large data volume and high QPS, the system architecture remains straightforward, emphasizing correct CODE‑URL mapping, optional Redis caching, and multi‑pod high availability.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.