Backend Development 17 min read

Shopee Off-Platform Ads Delay Service: Architecture and Implementation

Shopee’s off‑platform ads delay service combines Redis Zsets for expiration tracking, HBase for payload storage, and Kafka for queuing to reliably process up to 6 million tasks per minute with minute‑level delays ranging from one minute to thirty days, achieving horizontal scalability, fault tolerance, and a 75 % reduction in Kubernetes resource usage.

Shopee Tech Team
Shopee Tech Team
Shopee Tech Team
Shopee Off-Platform Ads Delay Service: Architecture and Implementation

This article introduces the technical architecture and evolution of Shopee's delay service for marketing automation scenarios. The service handles delayed task execution where certain actions need to be triggered after a specified time delay following an event.

Business Requirements and Technical Challenges:

The system must support high-performance delay task submission and expiration processing, requiring capability to handle 6 million tasks per minute during peak periods. It needs to support flexible delay durations ranging from 1 minute to 30 days with minute-level precision. The system must manage billions of delay tasks while ensuring horizontal scalability, fault tolerance, and recovery capabilities.

Solution Selection:

The team evaluated multiple approaches: MySQL-based implementation (rejected due to B+ tree不适合高并发随机写入), Redis Zset (rejected due to prohibitive memory costs for百TB storage), delayed message queues like RocketMQ (rejected due to limited 18 delay levels and write amplification), and hierarchical timing wheel algorithm with Kafka (rejected due to I/O performance spikes during level demotion).

Final Architecture: Redis Zset + HBase + Kafka:

The solution uses Redis Zset to store task expiration times and unique keys (controlling memory to百GB level using 8-byte snowflake IDs), while storing task payloads in HBase for its good read/write performance (5w+ QPS), horizontal scalability, scan support, and TTL capabilities. Kafka serves as the message queue for task submission and dispatch.

Task Submission Flow:

Business systems submit delay tasks via RPC. Tasks are first written to a Kafka topic ( delay input topic ), then consumed and written to both HBase (for payload storage) and Redis Zset (for expiration tracking). Multiple Zsets are used to distribute load, with tasks hashed to specific Zsets.

Task Expiration Flow:

Every minute, dispatcher coroutines compete for a distributed lock to initiate scanning. They generate scan tasks for each Zset and write to zset scan dispatch topic . Consumers read expired tasks from Zsets, query HBase for payloads, and forward complete tasks to business Kafka topics.

System Optimizations:

HBase optimizations include: pre-splitting regions with salt hashing to avoid hot spots, arranging exec_time in rowkey to improve cache hit rates, and creating multiple tables with different TTLs for efficient data expiration. Scan performance is optimized through Zset segmentation (grouping tasks by time windows) and multi-coroutine Kafka consumption with sliding window offset commit. Fast/slow submission strategies differentiate tasks by delay duration, writing long-delay tasks to throttled topics.

Reliability:

The system ensures a single Zset is processed by only one coroutine at a time, implements graceful shutdown on K8s sigterm signals, and provides disaster recovery by scanning unprocessed tasks from HBase during outages.

Results:

Testing showed 20w+ QPS read/write performance on a 5-node HBase cluster, with K8s resource usage reduced by 75% compared to earlier approaches.

performance optimizationarchitectureRedisKafkaHBasedistributed systemdelay-servicemarketing automation
Shopee Tech Team
Written by

Shopee Tech Team

How to innovate and solve technical challenges in diverse, complex overseas scenarios? The Shopee Tech Team will explore cutting‑edge technology concepts and applications with you.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.