Architecture and Design of ZhaiZhai IM System for a Second‑hand E‑commerce Platform
The article details the design and overall architecture of ZhaiZhai's instant messaging (IM) system, covering product positioning, a four‑layer architecture (user, entry, logic, storage), and analyses of scalability, high availability, reliability, extensibility, and performance, highlighting the use of C++, RPC, MySQL, TiDB, and custom middleware.
ZhaiZhai is a second‑hand e‑commerce platform where every user can be a buyer or a seller. The platform evolved from a simple information model to a closed‑loop transaction model, with an instant messaging (IM) service that connects buyers and sellers.
1. Product Positioning
The IM service must support multiple client types—native APP, lightweight mini‑programs, and the 58.com city‑level APP—because users have diverse habits. It also serves as an independent system that provides contact and private‑message capabilities to other platform components such as customer service and risk control, and it delivers real‑time product, order, transaction, and activity notifications to users.
2. System Architecture
The architecture consists of four layers from top to bottom: User Layer, Entry Layer, Logic Layer, and Atomic Storage Layer (see Figure 1).
User Layer – callers of the IM service, including APP, mini‑programs, M‑terminal, platform operation systems, and ZZRPC‑based services.
APP communicates with the IM server via TCP, while mini‑programs and M‑terminal use HTTP. ZZRPC is a Java‑based RPC framework developed in‑house; the IM service, written in C++, adapts to ZZRPC for inter‑service calls.
Entry Layer – the gateway of the IM system, comprising Entry, Http‑Entry, the self‑developed distributed message middleware ZZMQ, and IMUI.
Entry maintains TCP connections with APP and forwards request packets directly to the Logic layer without business processing, allowing independent restarts. Http‑Entry implements an HTTP‑based “long‑connection” for mini‑programs and M‑terminal, holding the connection until private‑message data is ready or a timeout occurs, then returning an HTTP response.
ZZMQ is a distributed message queue that receives system, broadcast, and push messages from various platform services and forwards them to the IM logic modules, decoupling the platform services from the IM system.
IMUI adapts ZZRPC calls for the IM system, acting as a ZZRPC service provider that receives client requests, translates them into the IM internal protocol, forwards them to the Logic layer, and returns results via ZZRPC.
Logic Layer – contains two modules: Logic (core lightweight business such as login, unread count, private messages) and Extlogic (non‑core heavyweight business). Both modules communicate via ZZMQ for decoupling. For example, private‑message handling in Logic forwards offline messages to Extlogic through ZZMQ.
Atomic Storage Layer – persists private messages, system messages, and contacts using MySQL and the NewSQL TiDB database. TiDB provides elastic horizontal scaling; MySQL uses sharding for load handling. Das receives read/write requests from the Logic layer, queues them locally, and performs synchronous database operations. ZZRedis is a self‑developed distributed cache for online user information. Jtransit, similar to IMUI, adapts ZZRPC calls for the platform.
3. Architectural Characteristics
The IM system is evaluated on scalability, high availability, reliability, extensibility, and performance.
Scalability – horizontal elastic scaling is achieved by adding machines to the Entry, Logic, Extlogic, and storage layers. Service relationships are managed by a control center that registers instances via long‑lived TCP connections.
High Availability – Entry and Http‑Entry availability is ensured by TGW and Nginx health checks. Logic instances are hashed to the same user to preserve ordering; if a Logic instance fails, the request is retried on another instance using a (x+p)%n algorithm. Extlogic and Das achieve HA through ZZMQ‑based decoupling and similar instance‑selection logic. MySQL uses a primary‑secondary configuration, while TiDB, ZZRedis, and ZZMQ have inherent HA properties.
Reliability – resource exhaustion in Logic during traffic spikes and database timeouts in Das can affect reliability. Solutions include splitting Logic into dedicated services (Login_Logic, Msg_Logic, Contact_Logic) and creating separate request queues in Das for different business types.
Extensibility – the Logic and Extlogic modules are decoupled via ZZMQ, allowing new features to be added in Extlogic without impacting core Logic.
Performance – the bottleneck lies in the database layer. Performance improvements involve upgrading hardware, adding nodes, and exploring new storage engines such as KList for contact data.
4. Conclusion
The ZhaiZhai IM system provides a reliable, high‑performance communication channel among users, customer service, and the platform. Its micro‑service and layered design ensure clear responsibilities for each component: Entry (TCP), Http‑Entry (HTTP), Logic (core lightweight business), Extlogic (non‑core heavyweight business), Das (database access), IMUI/Jtransit (ZZRPC adaptation), MySQL/TiDB/ZZRedis (persistence and caching), and ZZMQ (decoupling).
The architecture exhibits strong scalability, high availability, reliability, extensibility, and performance.
5. Author Biography
Wang Zongsheng, Senior R&D Engineer in ZhaiZhai's Architecture Platform Department, responsible for the IM system, push system, and distributed storage system.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.