Scaling Schema‑Free Classified Ads Platforms: Storage & Search for Billions

This article explains how to design a scalable architecture for classification‑info platforms that handle billions of rows, ten‑thousand attributes, and hundred‑thousand QPS by using vertical partitioning, unified post, category, and search services, along with compressed JSON extensions and external indexing.

ITPUB
ITPUB
ITPUB
Scaling Schema‑Free Classified Ads Platforms: Storage & Search for Billions

Background and Business Scenario

Classification‑info platforms host many vertical categories (recruitment, real‑estate, second‑hand goods, etc.) where the core data are “post” records. Each category has thousands of distinct attributes, leading to up to 10,000 attributes, 10 billion rows, and 100,000 queries per second.

Naïve Approach: Adding Columns and Composite Indexes

Initially a single table might be defined as: tiezi(tid, uid, c1, c2, c3); When a new category (e.g., real‑estate) is added, columns are simply appended: tiezi(tid, uid, c1, c2, c3, c10, c11, c12, c13); Composite indexes such as index_1(c1, c2), index_2(c2, c3), index_3(c1, c3) are created to satisfy multi‑attribute queries.

Problems with the Naïve Approach

Attribute diversity makes the number of required indexes explode.

Schema changes require table alterations and re‑indexing.

Cross‑category queries become impossible to cover with static indexes.

Maintenance overhead grows dramatically as more categories are added.

Vertical Partitioning as a Solution

Instead of a monolithic table, split posts by vertical domain:

tiezi_zhaopin(tid, uid, c1, c2, c3);
tiezi_fangchan(tid, uid, c10, c11, c12, c13);

This isolates schema per category but introduces new challenges: ID standardization, attribute governance, cross‑category search, and heterogeneous storage technologies.

Industry Best Practice: Three Core Services

1. Unified Post Center Service (Info Management Center, IMC)

A single service stores all posts in a sharded MySQL table with a generic schema: tiezi(tid, uid, time, title, cate, subcate, xxid, ext); The ext column holds a JSON object with category‑specific fields. Example JSONs:

{"job":"driver","salary":8000,"location":"bj"}
{"type":"iphone","money":3500}

Data is partitioned across 256 databases, cached with Memcached, and accessed via the post service.

2. Unified Category & Attribute Service (Category Management Center, CMC)

All attribute definitions are centralized. Each attribute is assigned a numeric key to compress storage, and constraints (type, enum, regex) are stored in the service.

Example mapping:

{"1":"driver","2":8000,"3":"bj"}
{"4":"iphone","5":3500}

Enum tables validate values (e.g., key 4 must be one of the predefined enum IDs).

The service also records hierarchical category relationships (e.g., recruitment → sub‑category → specific job type).

3. Unified Search Service

Because composite indexes cannot cover all attribute combinations at this scale, an external search engine is introduced.

Post ID queries are served directly from the post service (forward index).

All other attribute‑based queries are routed to the external index.

The search architecture includes a stateless proxy layer, a result‑aggregation layer, and a search core where index data is horizontally sharded and optionally replicated for performance.

Typical query flow:

Client requests a post ID → proxy forwards to post service.

Client requests a complex filter → proxy forwards to search service, which looks up the inverted index.

Updates to a post trigger notifications to both the post service and the search service to keep indexes in sync.

Key Challenges Addressed

Compression of ext keys reduces storage overhead.

Numeric keys are self‑describing via the category service, providing extensibility.

Adding new attributes only requires updating the category service, not the post schema.

Enum validation ensures data quality.

Hierarchical category metadata enables flexible UI rendering and query routing.

Conclusion

By separating concerns into three unified services—post storage, category/attribute management, and external search—platforms can handle 10 billion rows, 10 k attributes, and 100 k QPS while keeping the architecture extensible, maintainable, and performant.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

scalable architectureVertical Partitioningsearch servicelarge-scale dataschema-less storage
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.