Scaling Schema‑Free Classified Ads Platforms: Storage & Search for Billions
This article explains how to design a scalable architecture for classification‑info platforms that handle billions of rows, ten‑thousand attributes, and hundred‑thousand QPS by using vertical partitioning, unified post, category, and search services, along with compressed JSON extensions and external indexing.
Background and Business Scenario
Classification‑info platforms host many vertical categories (recruitment, real‑estate, second‑hand goods, etc.) where the core data are “post” records. Each category has thousands of distinct attributes, leading to up to 10,000 attributes, 10 billion rows, and 100,000 queries per second.
Naïve Approach: Adding Columns and Composite Indexes
Initially a single table might be defined as: tiezi(tid, uid, c1, c2, c3); When a new category (e.g., real‑estate) is added, columns are simply appended: tiezi(tid, uid, c1, c2, c3, c10, c11, c12, c13); Composite indexes such as index_1(c1, c2), index_2(c2, c3), index_3(c1, c3) are created to satisfy multi‑attribute queries.
Problems with the Naïve Approach
Attribute diversity makes the number of required indexes explode.
Schema changes require table alterations and re‑indexing.
Cross‑category queries become impossible to cover with static indexes.
Maintenance overhead grows dramatically as more categories are added.
Vertical Partitioning as a Solution
Instead of a monolithic table, split posts by vertical domain:
tiezi_zhaopin(tid, uid, c1, c2, c3); tiezi_fangchan(tid, uid, c10, c11, c12, c13);This isolates schema per category but introduces new challenges: ID standardization, attribute governance, cross‑category search, and heterogeneous storage technologies.
Industry Best Practice: Three Core Services
1. Unified Post Center Service (Info Management Center, IMC)
A single service stores all posts in a sharded MySQL table with a generic schema: tiezi(tid, uid, time, title, cate, subcate, xxid, ext); The ext column holds a JSON object with category‑specific fields. Example JSONs:
{"job":"driver","salary":8000,"location":"bj"} {"type":"iphone","money":3500}Data is partitioned across 256 databases, cached with Memcached, and accessed via the post service.
2. Unified Category & Attribute Service (Category Management Center, CMC)
All attribute definitions are centralized. Each attribute is assigned a numeric key to compress storage, and constraints (type, enum, regex) are stored in the service.
Example mapping:
{"1":"driver","2":8000,"3":"bj"} {"4":"iphone","5":3500}Enum tables validate values (e.g., key 4 must be one of the predefined enum IDs).
The service also records hierarchical category relationships (e.g., recruitment → sub‑category → specific job type).
3. Unified Search Service
Because composite indexes cannot cover all attribute combinations at this scale, an external search engine is introduced.
Post ID queries are served directly from the post service (forward index).
All other attribute‑based queries are routed to the external index.
The search architecture includes a stateless proxy layer, a result‑aggregation layer, and a search core where index data is horizontally sharded and optionally replicated for performance.
Typical query flow:
Client requests a post ID → proxy forwards to post service.
Client requests a complex filter → proxy forwards to search service, which looks up the inverted index.
Updates to a post trigger notifications to both the post service and the search service to keep indexes in sync.
Key Challenges Addressed
Compression of ext keys reduces storage overhead.
Numeric keys are self‑describing via the category service, providing extensibility.
Adding new attributes only requires updating the category service, not the post schema.
Enum validation ensures data quality.
Hierarchical category metadata enables flexible UI rendering and query routing.
Conclusion
By separating concerns into three unified services—post storage, category/attribute management, and external search—platforms can handle 10 billion rows, 10 k attributes, and 100 k QPS while keeping the architecture extensible, maintainable, and performant.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
