Stability Practices for Vivo Account System: Service Governance, Data Architecture, and Monitoring
Vivo’s account platform, serving 270 million users and over 100 billion daily requests, achieves high‑performance stability through disciplined service splitting, hierarchical dependency control, layered caching and sharding strategies, and comprehensive multi‑layer monitoring that together ensure scalability, availability, and rapid fault diagnosis.
Vivo account is the essential credential for accessing the entire Vivo ecosystem. With 270 million registered users and daily request volume exceeding 100 billion, the account system must meet high‑performance, high‑concurrency, and high‑availability requirements.
The article shares practical experience in three dimensions: application service governance, data‑architecture governance, and monitoring governance.
1. Application Service Governance
Service splitting is performed to improve scalability, maintainability, and stability. Splits are driven by (1) organizational changes (Conway’s Law) and (2) stability requirements. Core behaviours such as registration, login, and credential verification are identified using business‑value and call‑frequency dimensions, forming a four‑quadrant matrix. Minimal‑element aggregation avoids over‑splitting, and services are grouped where shared data elements exist. An overall split diagram shows the progression from a monolithic account service to separate services for login/verification, registration, and user‑profile data. Business‑value changes (e.g., real‑name verification for gaming) trigger further splits.
Implementation follows a two‑phase approach: first split the code (no refactoring), then split the data, using gray‑release techniques (traffic shading) to minimize risk. Gray‑release is realized either at the application layer (internal forwarding with a new internal domain) or at the gateway/reverse‑proxy layer (e.g., Nginx).
2. Relationship Governance
Service dependencies are kept hierarchical and acyclic (ADP principle). Strong dependencies are mitigated by redundancy (multiple providers with dynamic traffic allocation) or simple primary‑secondary fallback. Weak dependencies are decoupled via asynchronous messaging (e.g., Kafka), turning synchronous calls into message‑driven flows. The article notes the added complexity of async (ordering, latency, loss) and recommends producing messages from data‑generation processes rather than directly in service logic.
3. Data‑Architecture Governance
Cache strategy combines local and distributed caches. The Cache‑Aside pattern is adopted: reads check the cache first, miss → DB → cache; writes update the DB then delete the cache. Batch‑read performance is improved by compressing data before writing to Redis, switching serialization to protostuff for higher throughput and lower footprint, and using Redis pipelines to batch commands.
Database scaling uses read‑write separation (primary handles writes, replicas serve reads) and further vertical/horizontal sharding. Vertical sharding isolates core account tables (username, password, email, phone) from auxiliary data. Horizontal sharding is applied when table size exceeds tens of millions, reducing B‑Tree height and DDL latency. Data migration leverages MySQL replication with a three‑step cut‑over (disable writes, promote replica, switch routing). For massive tables (> 100 million rows), Canal‑based CDC migration is employed.
4. Monitoring Governance
Monitoring covers three layers: application‑level metrics (throughput, error rates, latency), middleware metrics (Redis, MQ, MySQL, JVM, etc.), and host‑level resources (CPU, memory, I/O, network). Aggregating these metrics enables rapid root‑cause analysis; for example, correlating a service‑latency alarm with JVM GC spikes or MySQL slow‑query alerts. The system also distinguishes callers for top‑N API traffic to detect abnormal usage patterns.
In summary, the account system’s stability is achieved through disciplined service decomposition, careful dependency management, appropriate caching and database strategies, and comprehensive multi‑layer monitoring.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.