Cloud Native 19 min read

How Service Mesh Powers TikTok’s Spring Festival Red Packet Traffic Surge

This article, based on a Volcano Engine developer community meetup, explains how a self‑developed Service Mesh provides unified traffic management for TikTok’s massive Spring Festival Red Packet event, covering architecture, stability, security, and efficiency strategies across multi‑language microservices in complex environments.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How Service Mesh Powers TikTok’s Spring Festival Red Packet Traffic Surge

Background and Challenges

In 2021 the TikTok Spring Festival Red Packet project gave business developers very little time to develop, test, and launch code. The project involved many microservices written in Go, C++, Java, Python, Node, and running on containers, VMs, or bare metal. Different stages of the event required different traffic‑governance policies, so the underlying infrastructure needed to provide a unified traffic‑governance capability for services from many teams and languages.

Traditional Microservice Approach

Traditional microservice architectures solve these problems by adding many functions to the service framework: network communication, request serialization/deserialization, service discovery, traffic‑governance policies, and observability (logging, monitoring, tracing). However, this approach brings high development and operation costs, version fragmentation, and the need for each language to implement the same features.

Network communication and request/response serialization.

Service discovery.

Various traffic‑governance strategies.

Observability capabilities.

Self‑Developed Service Mesh Implementation

The Volcano Engine team built a Service Mesh that introduces a data‑plane proxy process and a remote control‑plane service. The data‑plane proxy runs alongside each business service (in the same container or machine) and handles all traffic, providing service discovery, traffic‑governance, and observability.

The control plane decides routing and governance policies and pushes them to the data‑plane processes. Both planes are independent of business logic, allowing independent upgrades without notifying developers.

Service Mesh Traffic Governance Techniques

The mesh offers four core capabilities:

Routing : service discovery and rule‑based routing between microservices.

Security : authentication, authorization, and encryption to ensure trustworthy traffic.

Control : dynamic adjustment of governance policies for stability.

Observability : logging, monitoring, and tracing of traffic state.

Stability Strategies

Circuit Breaking

Circuit breaking monitors success rates of downstream nodes; when the rate falls below a threshold, traffic to that node is stopped. Recovery involves probing the node and gradually increasing traffic once it becomes healthy.

Rate Limiting

Rate limiting drops excess requests when a server’s QPS exceeds its capacity, preventing overload and avalanche effects.

Degradation

Degradation either discards a proportion of traffic or bypasses non‑critical dependencies, freeing resources for core paths.

Dynamic Overload Protection

Instead of static thresholds, dynamic overload protection detects overload by measuring pending request time (T2) and drops low‑priority traffic gradually, then restores traffic as the server recovers.

Load Balancing

Common strategies include random round‑robin, weighted round‑robin, and consistent hashing, which improves cache hit rates and reduces latency for large‑scale services.

Node Sharding

Node sharding groups many service instances into shards, reducing the number of long‑lived connections and improving connection reuse, which lowers errors and improves performance.

Efficiency Strategies

Lane isolation and colored traffic splitting allow independent copies of a set of microservices to handle specific features, enabling online debugging, fault‑injection drills, and traffic recording/replay.

Security Strategies

Authorization : restrict which services can call a given service.

Authentication : verify the authenticity of incoming traffic.

Mutual TLS (mTLS) : encrypt traffic to prevent eavesdropping and tampering.

Spring Festival Red Packet Scenario Implementation

Applying the mesh revealed performance challenges: additional protocol parsing and inter‑process communication overhead.

Protocol Parsing

To avoid heavy parsing, a small header carries service metadata, reducing parsing to a few hundred bytes.

Inter‑process Communication

The data‑plane proxies traffic via a Unix domain socket or local port instead of iptables, but still incurs memory copies. The team switched to shared memory with event notifications, cutting memory copies and improving performance by 24%.

Conclusion

The presentation showed how Service Mesh provides unified traffic‑governance capabilities that ensure microservice stability, security, and efficiency during massive traffic spikes such as TikTok’s Spring Festival Red Packet event.

Q&A

Q: Why does shared‑memory IPC reduce system calls? A: After placing a request in shared memory, the server is notified once; subsequent requests can be processed without additional wake‑up calls, reducing the number of system calls under high load.

Q: Is the mesh built from scratch or based on Istio? Which languages are used? A: The data‑plane is a C++ extension of Envoy; traffic hijacking uses agreed‑upon Unix domain sockets or local ports, not iptables. The Ingress Proxy and business processes share the same runtime environment, enabling seamless upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeMicroservicesSecurityService Meshtraffic managementstability
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.