How 58.com Scales 10 B Posts with 10 K Attributes: Architecture Secrets
58.com tackles the challenge of storing and searching billions of heterogeneous posts by employing a unified post center, a category‑attribute service, and an external search engine, using vertical table splitting, JSON‑based extensible fields, compressed keys, and horizontally sharded indexes to achieve massive scalability and high throughput.
Background and Business Overview
58.com is an information platform with many vertical categories such as recruitment, real estate, second‑hand goods, and each category has its own sub‑categories. The core data across all categories is the "post". These posts have millions of distinct attributes, reach a data volume of about 100 billion rows, and require up to 100 000 QPS.
The Naïve Approach and Its Limits
Initially a single table was used, e.g. tiezi(tid, uid, c1, c2, c3), and composite indexes such as index_1(c1, c2), index_2(c2, c3), index_3(c1, c3) were created to satisfy attribute‑combination queries. As new categories (e.g., real estate) were added, the table grew to tiezi(tid, uid, c1, c2, c3, c10, c11, c12, c13), and the number of required indexes exploded, making maintenance impossible.
Vertical Splitting as a Solution
Separate tables per category were introduced:
tiezi_zhaopin(tid, uid, c1, c2, c3) tiezi_fangchan(tid, uid, c10, c11, c12, c13)However, this introduced problems such as tid standardization, attribute standardization, cross‑category queries, and high operational cost.
Three Core Services Implemented by 58.com
1. Unified Post Center Service (IMC)
A single table stores common fields and an ext JSON column for category‑specific data, e.g.: tiezi(tid, uid, time, title, cate, subcate, xxid, ext) Examples of ext values: {"job":"driver","salary":8000,"location":"bj"} for recruitment posts, and {"type":"iphone","money":3500} for second‑hand posts.
The service uses MySQL with 256 sharded databases and Memcached for caching. It later migrated the storage engine to a self‑developed solution while keeping the architecture.
2. Unified Category‑Attribute Service (CMC)
All attribute metadata (name, type, constraints) is stored separately. The ext JSON uses numeric keys to reduce storage, e.g. {"1":"driver","2":8000,"3":"bj"}. A mapping table defines what each numeric key means for each sub‑category, and enumeration tables enforce valid values.
3. Unified Search Service (E‑search)
Because composite indexes cannot cover all attribute combinations at this scale, an external search engine handles all non‑ID queries. The architecture includes a stateless proxy layer, a result‑aggregation layer, and a search core layer with horizontally sharded indexes. Index data is replicated for capacity scaling, and full‑rebuilds are performed periodically to keep consistency.
Summary
By combining a metadata‑driven post service, a decoupled category‑attribute service, and a dedicated search service, 58.com solves the storage, schema‑decoupling, and retrieval challenges of 100 billion rows, 10 000 attributes, and 100 000 QPS workloads in a step‑by‑step, scalable manner.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
