How to Build a Billion‑User Scalable User Center: Architecture, APIs, Token Fallback, and Security
This article presents a comprehensive, practical design for an ultra‑large‑scale user center, covering microservice architecture, API separation, token generation with graceful degradation, data‑sharding strategies, password encryption, asynchronous processing, and detailed monitoring to ensure high availability, performance, and security.
Service Architecture
The user center is split into three independent microservices: a gateway service that aggregates business logic and external calls (e.g., risk control, SMS), a core service that handles simple business logic and data storage with minimal dependencies (only Redis or the database), and an asynchronous consumer service that processes message queues. This separation allows new features to be deployed by updating only the gateway, while the core and consumer services remain stable, at the cost of a longer call chain and the need for compatibility testing.
Interface Design
APIs are divided into Web and App groups. Web APIs must support cross‑origin single sign‑on, encryption, signature verification, and token validation, while App APIs follow a different security model. Core interfaces such as login receive special treatment: the user table is split into a lightweight core table (userId, username, phone, password, salt) and a profile table for auxiliary fields (avatar, nickname, gender). The login flow is kept as short as possible, relying only on read‑only database access and automatic service degradation (e.g., falling back to password‑only verification if risk‑control or SMS services fail). Replay‑attack protection is enforced by limiting request frequency per user within a time window, and user‑behavior profiling (phone verification, real‑name authentication, facial/liveness checks) is used to enhance security.
Token Flexible Degradation
Two token types are generated: a web token that can be combined with cookies for single sign‑on, and an app token stored in Redis. When Redis is unavailable, the server creates a special‑format token; during validation, the token format is inspected. If it matches the fallback pattern, the token is decrypted, its embedded userId, phone, random code, and expiration are extracted, and the database is queried to verify the login. Rate limiting is added to the fallback path to avoid overwhelming the database.
Data Security
Sensitive data is stored separately: passwords, phone numbers, and salts reside in one database, while profile information lives in another. Passwords undergo blacklist validation to reject weak choices. Encryption uses salted hashes, with optional bcrypt or scrypt for stronger protection. Although bcrypt/scrypt increase security against rainbow‑table attacks, they add CPU and memory overhead, which may affect login latency; the trade‑off must be evaluated based on business requirements.
Sharding Strategy
When user data exceeds 100 million records, vertical splitting is applied first: core fields stay in the main user table, while auxiliary fields move to a profile table. Event tables (e.g., login logs) are migrated to separate databases. For horizontal scaling, two methods are described:
Index‑table method: a mapping table links mobile or username to UID, allowing direct routing to the correct shard.
Gene method: generate an N‑bit “gene” from the mobile number, combine it with an M‑bit globally unique ID to form UID, then use the N‑bit suffix to determine the target database.
During registration, compute N‑bit gene from the mobile number (mobile_gen = f(mobile)).
Generate an M‑bit globally unique ID.
Concatenate M and N to form UID.
Use N‑bit remainder to select the target database for insertion.
When querying, apply the same N‑bit remainder to locate the correct shard.
The gene method suits scenarios with frequent mobile‑based lookups; other cases may prefer the index‑table approach.
Asynchronous Consumption Design
After user actions (login, registration, profile updates), events are written to a message queue. Downstream services (e.g., points, coupons) consume these events asynchronously, decoupling them from the user center. If the MQ is unavailable or messages are lost, compensation mechanisms are triggered. This design reduces the synchronous load on the user center and provides a rich source for user‑profile data.
Monitoring
Critical metrics include QPS of key interfaces, memory usage, garbage‑collection pause times, service latency, database binlog write rates, and Zipkin‑based end‑to‑end tracing. Alerts are configured for abnormal drops in traffic or latency spikes. For example, a sudden decline in registration volume triggers an alarm, enabling rapid response to mitigate revenue loss.
Conclusion
The article outlines a practical blueprint for a billion‑scale user center, covering service architecture, API design, token degradation, data security, sharding, asynchronous processing, and comprehensive monitoring. It also notes remaining challenges such as graceful extraction of authentication services, refining monitoring granularity, and the perpetual pursuit of higher availability, performance, and security.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
