Backend Development 13 min read

Design and Implementation of Vipshop's Message Gateway

This article presents a comprehensive overview of Vipshop's message gateway redesign, covering its architectural positioning, internal modules, technical stack, monitoring, degradation strategies, and practical lessons learned to handle massive messaging traffic in a large‑scale e‑commerce environment.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design and Implementation of Vipshop's Message Gateway

Background – Vipshop, a large e‑commerce platform with 400 million registered users and 20 million daily active members, faced performance bottlenecks in its legacy message gateway during high‑traffic promotions, prompting a complete redesign.

Architecture Positioning – The new gateway separates logical message handling from physical delivery channels, acting as a company‑wide foundational service that receives requests from upstream business systems and forwards them to various third‑party channels such as telecom and WeChat.

Internal Structure – The gateway consists of modules for business‑friendly intake, template management, frequency control, priority‑based dispatch, feedback collection, delayed sending, subscription control, and auxiliary functions. (See Figure 2.)

Design Details

1. Message Acceptance and Distribution – Different message types (critical, notification, marketing) are routed through dedicated pipelines to avoid interference, with critical messages using exclusive SMS channels and backup HAProxy lines.

2. Business‑Friendly Interface – A unified API with pre‑configured templates shields business services from channel‑specific logic and reduces integration overhead.

3. Frequency Control – The gateway enforces per‑user limits to prevent over‑messaging, rejecting requests that exceed configured quotas.

4. User Subscription – Subscription data is stored in the member system; the gateway queries this cache and reports opt‑out actions back to keep the data consistent.

5. Failure Retry – Two retry strategies are employed: persistent retries for notifications stored on disk and lightweight logging‑only retries for time‑sensitive verification codes.

6. Feedback Statistics – As messages traverse multiple hops, asynchronous feedback queues collect delivery results, which are later ETL‑ed into a big‑data platform for cost accounting and analysis.

Technical Stack – The service is built on Vipshop’s self‑developed RPC framework Venus (OSP protocol over Netty/Thrift), MySQL for persistence, Kafka for high‑throughput queues, and a custom VDP pipeline to sync binlogs to HDFS.

Monitoring and Degradation – Key metrics (request rates, Kafka lag, CPU/memory, channel latency) are exported to Mercury and Zabbix. Degradation plans include proxy failover, Kafka emergency recovery, master‑slave DB switching, subscription fallback, and retry mechanisms for downed physical channels.

Practical Reflections – The article discusses boundary decisions between the gateway and upstream systems, data usage for marketing analytics and cost allocation, and lessons such as handling carrier‑specific SMS protocols, asynchronous persistence ordering, Redis sizing guidelines, and kernel tuning for load testing.

Conclusion – The redesigned gateway successfully handled billions of messages during peak sales events, offering a reusable reference for similar large‑scale messaging scenarios.

monitoringbackend architecturescalabilityKafkamessage gatewayVenus RPC
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.