Backend Development 13 min read

How 1号店 Scaled Its Search Engine for 11.11: Distributed Architecture, Sharding, and Auto‑Scaling

This article explains how 1号店 built a distributed search engine with horizontal scaling, custom sharding and routing strategies, automated deployment, rapid expansion, and real‑time monitoring to handle the massive traffic spikes of the annual 11.11 e‑commerce promotion.

ITFLY8 Architecture Home

Mar 29, 2018

How 1号店 Scaled Its Search Engine for 11.11: Distributed Architecture, Sharding, and Auto‑Scaling

Core Requirements for 11.11

The 11.11 shopping festival generates massive, sudden traffic, creating two main needs: scalability to withstand the load and rapid response to minimize loss when issues arise.

Distributed Search Engine

1号店 built a distributed search engine on top of Lucene/Solr, using the SOA framework Hedwig to create a layer that distributes and merges search requests. A management console supports multi‑index, cluster management, full‑index switching, and real‑time updates.

Instead of adopting SolrCloud or Elasticsearch as a black‑box, the team chose a custom solution to allow complex search conditions, custom plugins, and fine‑grained performance monitoring.

Sharding splits a large index into multiple small shards that can be deployed across many servers, keeping each shard small enough for fast search responses.

The architecture consists of two main components: Shard Searcher (similar to a single‑node search engine) and Broker, which translates a global search request into per‑shard requests, dispatches them, and merges the results.

When many shards or large result sets are involved, memory and CPU pressure can arise. Techniques such as limiting the number of shards per request or using DeepPaging can mitigate this.

Deployment options for Broker and Shard Searcher include:

Each node runs both a Broker and part of the Shard Searcher.

Broker deployed as a separate cluster.

Broker embedded in the client application.

The team initially used option A and later switched to C to reduce a network round‑trip and enable richer routing logic on the client side.

Efficient Routing Strategy

Routing must consider business scenarios. For example, books are long‑tail items with many SKUs but low traffic, while fast‑moving consumer goods have few SKUs but high traffic. A simple modulo‑shard strategy would be inefficient.

The chosen sharding strategy splits by primary category, then by secondary category if a shard is too large, and finally merges shards based on relevance if they become too small. Resulting shard sizes are typically 200‑500 MB, with single‑shard search latency under 10 ms.

Routing differs for category navigation and keyword search:

Category navigation: route directly to the shard(s) based on category ID, caching the routing table in the Broker.

Keyword search: hot words (top 20% of traffic) have pre‑computed routing tables built after indexing; non‑hot words use a first‑search‑all‑shards approach to record results for future caching.

A 360‑Routing rule further optimizes by analyzing the top five result pages (360 items) and adjusting routing to prioritize shards that contributed those results, covering over 80% of traffic.

These routing rules reduce the average shard access to one‑third of the total index, saving two‑thirds of resources.

Automatic Deployment and Rapid Scaling

Sharding and routing introduce deployment complexity. The deployment algorithm follows three principles:

Determine replica count per shard based on traffic and high‑availability requirements.

Ensure a node never hosts multiple replicas of the same shard and that total traffic per node matches its capacity.

Balance total index size across nodes.

After deployment, the system’s performance is illustrated below:

A packing algorithm automates the generation of deployment plans, handling node additions, removals, and replacements. The same tool supports rapid scaling during the 11.11 event: the Index Server computes a new plan, writes it to ZooKeeper, and each Shard Searcher updates its local configuration accordingly, pulling updated index data from HDFS and merging real‑time updates.

Real‑Time Monitoring System

During the 11.11 promotion, every minute matters. The monitoring system validates each stage of index preparation (record counts, processing exceptions, search result differences) and continuously runs typical search queries to compare ranking results, alerting on large deviations.

It also detects sold‑out items that still appear in results, triggering index updates to demote them.

Conclusion

Each year the 11.11 event tests the system’s limits. By building a distributed search engine, designing business‑aware sharding and routing, automating deployment and scaling, and establishing a comprehensive validation and monitoring pipeline, 1号店 achieved scalable, efficient search that can handle future traffic spikes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

real-time monitoring Auto Scaling Distributed Search e-commerce backend

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.