How We Scaled a Billion‑Item Search Engine with Elasticsearch: From Zero to One
This article details the practical journey of building and scaling an Elasticsearch‑based search system that supports tens of millions to billions of items, covering architecture design, capacity planning, multi‑data‑center deployment, data synchronization via RocketMQ and Flink, and multi‑layer reconciliation to ensure consistency and high QPS.
Business Background
The platform's招商管理系统 serves the Douyin e‑commerce activity ecosystem, handling diverse entities such as live‑streamer rooms, product招商, and coupon招商, with product招商 being the largest volume.
Data Center
The Data Center is an Elasticsearch‑based search service that provides configurable, scalable, and generic data retrieval for the招商 platform.
Key Concepts
Metric : Metadata describing an entity attribute (e.g., product name, shop score) and the smallest unit for update/query.
Collection : A group of related metrics that can be fetched by a common identifier (e.g., product ID, shop ID).
Solution : The data‑fetching method abstracted for each collection, enabling horizontal expansion.
Custom Header : Titles displayed in a two‑dimensional table, one‑to‑many with metrics.
Filter Item : One‑to‑one relationship with a metric, used for table filtering.
Audit View : A dynamic audit page rendered from a set of custom headers and filter items.
Function Design
Metrics flow to filter items and custom headers, which together generate dynamic audit views. Because multiple entities and scenarios exist, the design allows any combination of audit view effects.
Building ES Cluster from 0 to 1
Stability requirements include basic disaster‑recovery mechanisms and eventual data consistency across multiple data‑center replicas.
Basic disaster‑recovery: the system must self‑adjust when component failures or traffic spikes affect performance.
Data final consistency: ensure DB → ES replication is complete across all regions.
Capacity Evaluation
Evaluating ES cluster capacity ensures stable service for future growth.
Determine the number of shards per index and estimate data‑volume and traffic growth.
Decide the number of data instances per cluster and their specifications.
Understand vertical vs. horizontal scaling and define response strategies for unexpected data or traffic spikes.
Key Solutions
Shard count must be set correctly at creation; it cannot be changed later. Shard count should be an integer multiple of ES instances for load balancing.
Keep each shard size between 10‑30 GB; oversized shards degrade query performance.
Traffic spikes can be handled by scaling; data spikes require deleting old data or adding shards. Multi‑region disaster‑recovery must be deployed.
Data Synchronization Scheme
DB → ES synchronization is achieved via RocketMQ + Flink, with Faas scripts for initial record enrichment and periodic tasks for metric updates.
Option 1: Direct write via Dsyncer to multiple regions
Drawbacks: weak multi‑region write guarantees, limited bulk write performance, and difficulty ensuring ordered updates across regions.
Option 2: RocketMQ to a single‑region ES
Data is written to one region via RocketMQ, then replicated to other regions using ES cross‑cluster replication.
Option 3: RocketMQ + Flink to multiple regions (chosen)
Multiple independent consumer groups write to each region via Flink, providing independent write paths and eliminating single‑point failure.
Ordered Consumption with RocketMQ
RocketMQ ensures ordered consumption by routing messages with the same sharding key to the same queue, guaranteeing FIFO order within each queue.
Messages are partitioned by sharding key into multiple queues.
Each queue preserves strict FIFO order for publishing and consumption.
Sharding key differs from a normal message key and determines the partition.
Applicable when high performance and ordered processing are required.
Multi‑layer Reconciliation Mechanism
Three layers of reconciliation ensure DB → ES consistency.
BCP Real‑time Reconciliation
Listens to Binlog and directly compares DB and ES across regions, detecting delays or failures within seconds.
Minute‑level Reconciliation
Periodically queries DB and ES without relying on intermediate components; mismatches trigger automatic compensation.
T+1 Offline Reconciliation
Synchronizes DB and ES data to Hive on a daily basis; batch verification ensures eventual consistency with compensation applied by the next day.
Result
After the first phase, the ES cluster supports tens of millions of product indices, handling 500‑1000 QPS read traffic and around 500 QPS write traffic per region, with full disaster‑recovery, consistency checks, and robust failure‑handling strategies in place.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
