How Service Mesh Powers TikTok’s Spring Festival Red Packet Traffic Surge
This article, based on a Volcano Engine developer community meetup, explains how a self‑developed Service Mesh provides unified traffic management for TikTok’s massive Spring Festival Red Packet event, covering architecture, stability, security, and efficiency strategies across multi‑language microservices in complex environments.
Background and Challenges
In 2021 the TikTok Spring Festival Red Packet project gave business developers very little time to develop, test, and launch code. The project involved many microservices written in Go, C++, Java, Python, Node, and running on containers, VMs, or bare metal. Different stages of the event required different traffic‑governance policies, so the underlying infrastructure needed to provide a unified traffic‑governance capability for services from many teams and languages.
Traditional Microservice Approach
Traditional microservice architectures solve these problems by adding many functions to the service framework: network communication, request serialization/deserialization, service discovery, traffic‑governance policies, and observability (logging, monitoring, tracing). However, this approach brings high development and operation costs, version fragmentation, and the need for each language to implement the same features.
Network communication and request/response serialization.
Service discovery.
Various traffic‑governance strategies.
Observability capabilities.
Self‑Developed Service Mesh Implementation
The Volcano Engine team built a Service Mesh that introduces a data‑plane proxy process and a remote control‑plane service. The data‑plane proxy runs alongside each business service (in the same container or machine) and handles all traffic, providing service discovery, traffic‑governance, and observability.
The control plane decides routing and governance policies and pushes them to the data‑plane processes. Both planes are independent of business logic, allowing independent upgrades without notifying developers.
Service Mesh Traffic Governance Techniques
The mesh offers four core capabilities:
Routing : service discovery and rule‑based routing between microservices.
Security : authentication, authorization, and encryption to ensure trustworthy traffic.
Control : dynamic adjustment of governance policies for stability.
Observability : logging, monitoring, and tracing of traffic state.
Stability Strategies
Circuit Breaking
Circuit breaking monitors success rates of downstream nodes; when the rate falls below a threshold, traffic to that node is stopped. Recovery involves probing the node and gradually increasing traffic once it becomes healthy.
Rate Limiting
Rate limiting drops excess requests when a server’s QPS exceeds its capacity, preventing overload and avalanche effects.
Degradation
Degradation either discards a proportion of traffic or bypasses non‑critical dependencies, freeing resources for core paths.
Dynamic Overload Protection
Instead of static thresholds, dynamic overload protection detects overload by measuring pending request time (T2) and drops low‑priority traffic gradually, then restores traffic as the server recovers.
Load Balancing
Common strategies include random round‑robin, weighted round‑robin, and consistent hashing, which improves cache hit rates and reduces latency for large‑scale services.
Node Sharding
Node sharding groups many service instances into shards, reducing the number of long‑lived connections and improving connection reuse, which lowers errors and improves performance.
Efficiency Strategies
Lane isolation and colored traffic splitting allow independent copies of a set of microservices to handle specific features, enabling online debugging, fault‑injection drills, and traffic recording/replay.
Security Strategies
Authorization : restrict which services can call a given service.
Authentication : verify the authenticity of incoming traffic.
Mutual TLS (mTLS) : encrypt traffic to prevent eavesdropping and tampering.
Spring Festival Red Packet Scenario Implementation
Applying the mesh revealed performance challenges: additional protocol parsing and inter‑process communication overhead.
Protocol Parsing
To avoid heavy parsing, a small header carries service metadata, reducing parsing to a few hundred bytes.
Inter‑process Communication
The data‑plane proxies traffic via a Unix domain socket or local port instead of iptables, but still incurs memory copies. The team switched to shared memory with event notifications, cutting memory copies and improving performance by 24%.
Conclusion
The presentation showed how Service Mesh provides unified traffic‑governance capabilities that ensure microservice stability, security, and efficiency during massive traffic spikes such as TikTok’s Spring Festival Red Packet event.
Q&A
Q: Why does shared‑memory IPC reduce system calls? A: After placing a request in shared memory, the server is notified once; subsequent requests can be processed without additional wake‑up calls, reducing the number of system calls under high load.
Q: Is the mesh built from scratch or based on Istio? Which languages are used? A: The data‑plane is a C++ extension of Envoy; traffic hijacking uses agreed‑upon Unix domain sockets or local ports, not iptables. The Ingress Proxy and business processes share the same runtime environment, enabling seamless upgrades.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
