Designing a Scalable Short URL Service for Hundreds of Millions of Links
This article details the architecture, compression‑code generation strategies, database schema, high‑concurrency handling, and Java implementation of a short‑URL service capable of supporting hundreds of millions of short links, including code snippets and deployment considerations.
1. Background
Short URLs are common in SMS services and template message pushes where long URLs are costly. The service provides concise short URLs such as http://1.cn/23sM5J that redirect to the original long address.
GitHub: https://github.com/plasticene/plasticene-boot-starter-parent Gitee: https://gitee.com/plasticene3/plasticene-boot-starter-parent
2. Overview Design
The core of the short‑link service is a one‑to‑one mapping between a short code and a long URL; when a browser accesses the short URL, the service redirects to the original long URL.
2.1 Application Example
Example from a Sephora membership SMS: short link http://ew7.cn/?M2Fj redirects to the long URL https://m.sephora.cn/v2/html/rewardsBoutique/, then to a login page and finally to the rewards page.
2.2 Compression Code Generation Design
Base64 encoding yields 64⁷ ≈ 4 trillion possibilities for a 7‑character code and 64⁶ ≈ 680 billion for a 6‑character code. Business volume fits 6‑character codes such as http://1.cn/S3ke6J.
Hash‑based generation : Compute MD5 or SHA‑256 of the long URL, Base64‑encode the hash, and take the first 6 characters. This may cause collisions, requiring lookup and possible re‑hashing, which hurts performance.
Auto‑increment ID generation : Use a database auto‑increment key or distributed ID, then Base64‑encode it. Guarantees uniqueness and can produce very short codes (e.g., ID 0 → A), but the codes become predictable and vulnerable to enumeration.
Pre‑generated codes : Generate a pool of random 6‑character Base64 strings in advance, checking for duplicates with a Bloom filter. Store the pool in Redis; when the pool size falls below a configured minimum, generate new codes asynchronously, ensuring no runtime performance impact.
2.3 Short‑Link Generation Flow
High QPS read requests and storage of pre‑generated codes are the main challenges. Load balancing and distributed caching (Redis) are used to handle extreme read traffic.
3. Implementation
3.1 Database Design
-- Table structure for unique_code
DROP TABLE IF EXISTS `unique_code`;
CREATE TABLE `unique_code` (
`id` bigint(20) NOT NULL,
`code` varchar(16) NOT NULL COMMENT '压缩码',
`status` tinyint(4) NOT NULL DEFAULT '0' COMMENT '状态 :0:未使用 1:已使用 -1:失效',
`type` tinyint(4) NOT NULL DEFAULT '0' COMMENT '生成方式:0:随机数 1:分布式id 2:hash',
`deleted` tinyint(4) NOT NULL DEFAULT '0',
`creator` bigint(20) DEFAULT NULL,
`updater` bigint(20) DEFAULT NULL,
`create_time` datetime DEFAULT NULL,
`update_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='压缩码表';
-- Table structure for url_link
DROP TABLE IF EXISTS `url_link`;
CREATE TABLE `url_link` (
`id` bigint(20) NOT NULL,
`unique_code` varchar(255) NOT NULL COMMENT '唯一压缩码',
`short_url` varchar(255) NOT NULL COMMENT '短链接地址',
`long_url` varchar(1000) NOT NULL COMMENT '长链接地址',
`long_url_md5` varchar(255) NOT NULL COMMENT '长链接地址md5',
`deleted` tinyint(4) NOT NULL DEFAULT '0',
`create_time` datetime DEFAULT NULL,
`update_time` datetime DEFAULT NULL,
`creator` bigint(20) DEFAULT NULL,
`updater` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='链接映射表';
-- Table structure for visit_record
DROP TABLE IF EXISTS `visit_record`;
CREATE TABLE `visit_record` (
`id` bigint(20) NOT NULL COMMENT '主键',
`url_link_id` bigint(20) NOT NULL COMMENT 'url映射id',
`unique_code` varchar(16) NOT NULL COMMENT '压缩码',
`client_id` varchar(128) NOT NULL COMMENT '唯一身份标识,SHA-1(客户端IP-UA)',
`client_ip` varchar(64) NOT NULL COMMENT '客户端IP',
`visit_time` datetime NOT NULL COMMENT '访问时间',
`user_agent` varchar(2048) DEFAULT NULL COMMENT 'UA',
`country` varchar(32) DEFAULT NULL COMMENT '国家',
`province` varchar(32) DEFAULT NULL COMMENT '省份',
`city` varchar(32) DEFAULT NULL COMMENT '城市',
`isp` varchar(32) DEFAULT NULL COMMENT '网络服务运营商',
`browser_type` varchar(64) DEFAULT NULL COMMENT '浏览器类型',
`browser_version` varchar(128) DEFAULT NULL COMMENT '浏览器版本号',
`os_type` varchar(32) DEFAULT NULL COMMENT '操作系统型号',
`device_type` varchar(32) DEFAULT NULL COMMENT '设备型号',
`os_version` varchar(32) DEFAULT NULL COMMENT '操作系统版本号',
`deleted` tinyint(4) DEFAULT '0' COMMENT '软删除标识',
`creator` bigint(20) DEFAULT '0' COMMENT '创建者',
`updater` bigint(20) DEFAULT '0' COMMENT '更新者',
`create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='访问记录';3.2 Pre‑Generate Compression Codes
@Service
@Slf4j
public class UniqueCodeServiceImpl extends ServiceImpl<UniqueCodeDAO, UniqueCode> implements UniqueCodeService {
@Resource
private UniqueCodeDAO uniqueCodeDAO;
@Resource
private StringRedisTemplate stringRedisTemplate;
@Resource
private IdGenerator idGenerator;
@Resource
private ExecutorService executorService;
@Resource
private ShortUrlBloomFilter shortUrlBloomFilter;
@Value("${unique-code.max-size}")
private Integer maxSize;
@Value("${unique-code.min-size}")
private Integer minSize;
private static final String UNIQUE_CODE_KEY = "short_url_unique_code";
@Override
public String getUniqueCode() {
String code = stringRedisTemplate.opsForSet().pop(UNIQUE_CODE_KEY);
asyncGenerateUniqueCode();
return code;
}
@Override
public List<String> getUniqueCode(Integer size) {
List<String> codes = stringRedisTemplate.opsForSet().pop(UNIQUE_CODE_KEY, size);
asyncGenerateUniqueCode();
return codes;
}
@Override
public PageResult<UniqueCode> getUnusedList(PageParam pageParam) {
LambdaQueryWrapper<UniqueCode> queryWrapper = new LambdaQueryWrapper<>();
queryWrapper.eq(UniqueCode::getStatus, CommonConstant.CODE_NOT_USE);
queryWrapper.select(UniqueCode::getCode);
return uniqueCodeDAO.selectPage(pageParam, queryWrapper);
}
@Override
@Transactional(rollbackFor = Exception.class)
public void generateUniqueCode() {
log.info("==========开始生成压缩码===========" );
long startTime = System.currentTimeMillis();
Set<String> codes = new HashSet<>();
Set<String> existCodes = new HashSet<>();
for (int i = 0; i < maxSize; i++) {
String code = RandomUtils.generateCode(6);
Boolean exist = shortUrlBloomFilter.isExist(code);
if (exist) {
existCodes.add(code);
} else {
codes.add(code);
}
}
if (!CollectionUtils.isEmpty(existCodes)) {
log.info("=========以下压缩码已存在:{}", existCodes);
}
stringRedisTemplate.opsForSet().add(UNIQUE_CODE_KEY, codes.toArray(new String[0]));
List<UniqueCode> uniqueCodeList = new ArrayList<>();
codes.forEach(code -> {
UniqueCode uniqueCode = new UniqueCode();
uniqueCode.setId(idGenerator.nextId());
uniqueCode.setCode(code);
uniqueCodeList.add(uniqueCode);
});
saveBatch(uniqueCodeList);
long costTime = System.currentTimeMillis() - startTime;
log.info("===============结束生成压缩码, costTime:[{}],Count:[{}]==============", costTime, codes.size());
}
public void asyncGenerateUniqueCode() {
Long size = stringRedisTemplate.opsForSet().size(UNIQUE_CODE_KEY);
if (size < minSize) {
executorService.execute(this::generateUniqueCode);
}
}
}The service pre‑generates a configurable maximum number of codes; when the pool drops below the minimum, asynchronous generation replenishes the pool without affecting request latency.
3.3 Generate Short URL
@Override
public String generateShortUrl(String longUrl) {
// 1. Validate URL format
if (!isValidUrl(longUrl)) {
throw new BizException("无效的url");
}
// 2. Return existing short URL if present
Object value = redisTemplate.opsForHash().get(LONG_MD5_CODE_MAP, longUrl);
if (Objects.nonNull(value)) {
return domain + value.toString();
}
// 3. Create new short URL
long id = idGenerator.nextId();
String uniqueCode = uniqueCodeService.getUniqueCode();
String longUrlMd5 = DigestUtils.md5DigestAsHex(longUrl.getBytes());
String shortUrl = domain + uniqueCode;
UrlLink urlLink = new UrlLink();
urlLink.setId(id);
urlLink.setUniqueCode(uniqueCode);
urlLink.setShortUrl(shortUrl);
urlLink.setLongUrl(longUrl);
urlLink.setLongUrlMd5(longUrlMd5);
urlLinkDAO.insert(urlLink);
// Store mappings in Redis for fast lookup
redisTemplate.opsForHash().put(SHORT_LONG_MAP, uniqueCode, longUrl);
redisTemplate.opsForHash().put(LONG_MD5_CODE_MAP, longUrlMd5, uniqueCode);
return shortUrl;
}Example persistence row:
id unique_code short_url long_url long_url_md5 deleted create_time update_time creator updater
3362951037714432 j51YO8 http://127.0.0.1:18800/j51YO8 https://gitee.com/plasticene3/plasticene-boot-starter-parent c411c28da0302d3f0c9e34872c3b66d1 0 2022-08-18 17:00:21 2022-08-18 17:00:21 1 13.4 Short‑URL Redirection
@Override
public void redirect(HttpServletRequest request, HttpServletResponse response, String uniqueCode) throws IOException {
String longUrl = shortUrlService.getOriginUrl(uniqueCode);
if (StringUtils.isBlank(longUrl)) {
throw new BizException("短链接地址不存在");
}
// Asynchronously record the visit
executorService.execute(() -> visitRecordService.addVisitRecord(request, uniqueCode));
response.sendRedirect(longUrl);
}3.5 Asynchronous Visit Recording
@Override
@Transactional(rollbackFor = Exception.class)
public void addVisitRecord(HttpServletRequest request, String uniqueCode) {
VisitRecord visitRecord = new VisitRecord();
long id = idGenerator.nextId();
visitRecord.setId(id);
UrlLink urlLink = shortUrlService.getUrlLink(uniqueCode);
visitRecord.setUrlLinkId(urlLink.getId());
visitRecord.setUniqueCode(urlLink.getUniqueCode());
visitRecord.setVisitTime(new Date());
String agent = request.getHeader(USER_AGENT);
String clientIp = IpUtils.getRemoteHost(request);
IpRegion ipRegion = IpUtils.getIpRegion(clientIp);
visitRecord.setUserAgent(agent);
visitRecord.setClientIp(clientIp);
// Unique client identifier: SHA‑1(clientIp + "&" + agent)
String clientId = DigestUtil.sha1Hex(clientIp + "&" + agent);
visitRecord.setClientId(clientId);
visitRecord.setCountry(ipRegion.getCountry());
visitRecord.setProvince(ipRegion.getProvince());
visitRecord.setCity(ipRegion.getCity());
visitRecord.setIsp(ipRegion.getIsp());
if (StringUtils.isNotBlank(agent)) {
try {
UserAgent userAgent = UserAgent.parseUserAgentString(agent);
OperatingSystem os = userAgent.getOperatingSystem();
Optional.ofNullable(os).ifPresent(o -> {
visitRecord.setOsType(o.getName());
visitRecord.setOsVersion(o.getName());
Optional.ofNullable(o.getDeviceType()).ifPresent(dt -> visitRecord.setDeviceType(dt.getName()));
});
Browser browser = userAgent.getBrowser();
Optional.ofNullable(browser).ifPresent(b -> visitRecord.setBrowserType(b.getGroup().getName()));
Version browserVersion = userAgent.getBrowserVersion();
Optional.ofNullable(browserVersion).ifPresent(v -> visitRecord.setBrowserVersion(v.getVersion()));
} catch (Exception e) {
log.error("解析UserAgent异常,事件内容:", e);
}
}
visitRecordDAO.insert(visitRecord);
}Visit records capture client IP, geographic region, operating system, browser type and version, and device type, providing rich analytics for the short‑URL service.
Sample visit record row:
id url_link_id unique_code client_id client_ip visit_time user_agent country province city isp browser_type browser_version os_type device_type os_version deleted creator updater create_time update_time
2632861324673024 2620608022052864 ava5R7 7fb0070a4cb212b7dc0efce2cd08b017477787ae 10.8.4.7 2022-08-16 16:39:14 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 中国 浙江 杭州 电信 Chrome 104.0.0.0 Mac OS X Computer Mac OS X 0 1 1 2022-08-16 16:39:14 2022-08-24 18:15:18Source code: https://github.com/plasticene/plasticene-infra/tree/main/short-url-service
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shepherd Advanced Notes
Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
