How 58.com Scales Its Database: Architecture, High Availability, and Performance Tricks
This article explains 58.com’s database architecture, covering availability through replication and dual‑master setups, read‑performance enhancements with indexing, read replicas and caching, consistency solutions, rapid horizontal scaling methods, and a review of Codd’s twelve rules for relational design.
1. 58.com Database Architecture Design Overview
(1) Availability Design
Solution: replication + redundancy, which inevitably introduces consistency problems. To ensure high read availability, multiple slave replicas are used (see image). To guarantee write availability, a dual‑master mode is adopted, where one master can act as a slave when the other fails.
Problems: master‑slave inconsistency.
Solution for writes: dual‑master mode (or dual‑master used as master‑slave) without read‑write separation, so when a master fails the other takes over (see image).
Advantages: both reads and writes go to the master, solving consistency; dual‑master can act as master‑slave, solving availability.
(2) Read‑Performance Design
Three common ways to scale read performance:
Increase indexes – note that excessive indexes reduce write speed and increase memory usage.
Add read replicas – each replica can have its own set of indexes (see image).
Introduce caching – a service layer sits between the application and the database/cache (see image).
Typical setup:
Main DB: write‑only, no indexes.
Online replica: read‑only for online traffic, with online‑specific indexes.
Offline replica: read‑only for offline traffic, with offline‑specific indexes.
Adding more replicas improves read capacity but increases master‑slave latency and inconsistency risk.
(3) Consistency Design
Master‑slave inconsistency solutions:
Introduce middleware that routes reads to the master for a key until replication catches up.
Force all reads and writes to the master (the approach 58.com uses).
Cache‑database inconsistency is handled with a double‑eviction strategy: on a write, evict the cache, write the DB, then after a timer (estimated replication delay) evict again; on a read miss, load from DB into cache, allowing stale data to be removed by the second eviction.
(4) Scalability Design
4.1 Rapid horizontal scaling (N → 2N databases) without data migration:
Start with two databases (0 and 1) each using dual‑master‑as‑master‑slave.
Promote replicas, adjust service configuration, and double the number of databases within seconds.
Limitations: works only for N → 2N expansions; other scaling patterns require different strategies (e.g., log‑based or dual‑write approaches).
4.2 Field‑level expansion via log‑based or dual‑write methods.
4.3 Horizontal sharding patterns covering single‑key, one‑to‑many, many‑to‑many, and multi‑key scenarios (user, post, friend, order tables).
(5) SQL Techniques for Massive Data
Commonly avoided features: complex joins, sub‑queries, triggers, user‑defined functions, and transactions due to performance impact.
For sharded environments, pagination is achieved by rewriting queries, using auxiliary IDs, or two‑stage queries that first locate global offsets then fetch the final page.
2. Codd’s 12 Rules for Relational Databases
Information Rule : All information is represented uniformly as table values.
Guaranteed Access Rule : Data is accessible via table name, primary key, and column name.
Null Value Rule : Supports NULLs independent of data type.
Dynamic Online Catalog Rule : The database description is stored in tables accessible to users.
Comprehensive Data Sub‑language Rule : At least one language (SQL) supports definition, view, manipulation, constraints, authorization, and transaction.
View Update Rule : Updatable views can be modified by the system.
High‑Level Insert, Update, Delete Rule : Set‑based operations apply to inserts, updates, deletes.
Physical Data Independence : Changes in storage do not affect applications.
Logical Data Independence : Logical schema changes do not affect applications.
Integrity Independence : Integrity constraints are defined in the sub‑language, not in application code.
Distribution Independence : Physical distribution changes are transparent to applications.
Non‑Destructive Rule : Low‑level languages cannot bypass higher‑level integrity rules.
Additional best practices:
Minimize reliance on database‑specific features; implement business logic in application code.
When defining entity relationships, consider involved entities, ownership, and cardinality.
Source: http://www.cnblogs.com/wintersun/p/4638176.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
