How MaFengWo Scaled Its IM System: From PHP to Go and Service Splitting
This article chronicles the evolution of MaFengWo's instant‑messaging platform, detailing the transition from a simple PHP implementation to OpenResty optimizations, the introduction of multi‑mode routing in IM 2.0, and a complete service‑oriented redesign in Go for IM 3.0, while addressing scalability, multi‑device synchronization, and message reliability.
IM 1.0 – Early Stage
In the initial phase, the IM system was built with PHP to provide basic user‑to‑customer‑service chat, message sending/receiving, and consultation list management. Messages were placed into a Redis blocking queue and retrieved via HTTP long‑polling, which blocked when no messages were available to reduce polling frequency.
Message Polling Optimization
The long‑polling module originally relied on php‑fpm processes, causing high server load when many requests accumulated. The team replaced it with an OpenResty‑based solution that uses Lua coroutines to off‑load blocking operations, freeing php‑fpm processes and improving performance.
IM 2.0 – Requirement‑Driven Stage
Rapid business growth introduced many custom requirements. The system added advanced routing (average, weighted, queue‑based) and configurable customer‑service responses such as auto‑reply and FAQ. A typical consultation flow creates a reusable message chain, stores messages in a database, and assigns a customer service representative via a dispatch service.
IM 3.0 – Service Splitting Stage
Increasing code size, coupling, and unclear responsibilities made maintenance costly. The architecture was refactored into four major services: Customer Service, User Service, IM Service, and Data Service.
Customer Service : Provides group management, quality inspection, flexible assignment, and automated reply/FAQ features to improve agent efficiency.
User Service : Analyzes user behavior, generates recommendations, and tracks satisfaction metrics.
IM Service : Supports single‑chat and group‑chat, real‑time notifications, offline push, message history, contact lists, file upload, and content risk detection.
Data Service : Collects consultation metrics (response time, conversion rates, load, etc.) and offers statistical reports.
User State Flow
A complete user state diagram shows transitions from initial, pending assignment, assigned‑but‑unresolved, to automatically resolved states based on interaction and timeout.
IM Service Refactor
To improve modularity and scalability, the IM service was rewritten in Go. The new design introduces a Proxy layer and an Exchange layer.
Key Functions of Proxy/Exchange
Routing Rules : ip‑hash, round‑robin, least‑connections to distribute clients across ChannelManager instances.
Client Connection Management : Synchronizes connection metadata to a DispatchTable for fast lookup.
ChannelManager Protocol : Handles connection establishment, reconnection, heartbeat, message QoS, and send/receive operations.
REST API for Messaging : Provides single‑send and broadcast endpoints, used by PHP business logic to trigger messages.
Call Flow After Refactor
PHP creates the message chain and assigns a customer service agent. When a message needs forwarding, PHP calls the Dispatcher service, which looks up the target ChannelManager via the shared DispatcherTable and pushes the message through WebSocket.
Multi‑Device Synchronization
Client connections from PC, mobile, H5, iOS, and Android are stored in Redis hashes within the DispatcherTable. The table enables quick retrieval of all connections for a user and supports a 2‑hour expiration to clean up stale entries.
Online Status Synchronization
When a user comes online, the system pushes an online notification only to currently online agents, avoiding unnecessary polling.
Message Reliability
For long‑polling, the client sends the last read message ID; the server returns the delta. For WebSocket, the server waits for an ACK and retries if none is received, while the client deduplicates messages based on IDs.
Domain‑Driven Design for Message Flow
DDD modeling is used to differentiate message handling based on domain, endpoint, and role, ensuring that notifications are tailored appropriately across different clients.
Conclusion and Outlook
The IM system has progressed from a rudimentary PHP prototype to a Go‑based, service‑oriented architecture that supports high concurrency, multi‑device synchronization, and reliable messaging. Future plans include replacing the polling service with Go, further decoupling, and exploring AI‑powered smart customer service using TensorFlow.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mafengwo Technology
External communication platform of the Mafengwo Technology team, regularly sharing articles on advanced tech practices, tech exchange events, and recruitment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
