How to Build a Scalable Billion‑Item Elasticsearch Search System from Scratch

This article walks through the end‑to‑end design and implementation of a searchable system that scales from millions to hundreds of millions of products, covering business background, data‑center architecture, ES cluster sizing, multi‑room data sync, RocketMQ‑Flink pipelines, and multi‑layer reconciliation to achieve high QPS and strong consistency.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a Scalable Billion‑Item Elasticsearch Search System from Scratch

Business Background

The platform’s招商管理系统 (merchant recruitment management) supports TikTok e‑commerce activities across many entities such as live‑stream rooms, product recruitment, and coupon recruitment, with product recruitment being the largest. It needs a unified data‑center that can query and retrieve data for millions of items.

Key Concepts

Metric : A piece of metadata describing an attribute of an entity (e.g., product name, shop score, influencer level). Any field with clear semantics can be defined as a metric.

Collection : A set of related metrics that can be fetched together using a common identifier (e.g., product ID, shop ID). Collections express one‑to‑many relationships with metrics.

Solution : An abstraction that defines how to retrieve metrics from a specific collection.

Custom Header : The title row of a two‑dimensional data list, linked one‑to‑many with metrics.

Filter Item : A one‑to‑one mapping to a metric, used for filtering rows in a data list.

Audit View : A dynamically rendered page composed of custom headers and filter items for audit scenarios.

Data‑Center Design

The data‑center is an Elasticsearch‑based service that provides configurable, extensible, and generic data acquisition for the recruitment platform. It abstracts data synchronization and query capabilities, exposing two data sources: external RPC interfaces and recruitment‑record ES indices.

Functional Design

Metrics flow through Filter Item → Custom Header → Audit View to render dynamic audit pages. Because different recruitment entities require distinct audit views, the design allows arbitrary combinations of these components.

Building the ES Cluster (0 → 1)

Two stability requirements guide the initial build: basic disaster‑recovery mechanisms and eventual data consistency between the recruitment DB and multi‑region ES.

1. Capacity Planning

Determine the number of primary shards per index based on projected data growth and query/write traffic.

Decide the number and specifications of data nodes per cluster.

Understand vertical vs. horizontal scaling and design disaster‑recovery across multiple data‑centers.

Key Configuration Tips

{"dynamic": false}

to prevent unwanted mapping expansion. index.translog.durability=async improves write throughput at the risk of data loss.

Default refresh_interval is 1 s, meaning data becomes searchable within one second after a successful write.

Data Synchronization Solutions

The core challenge is keeping the recruitment DB and ES indices consistent in near‑real time.

Solution 1: Direct Multi‑Region Write (Dsyncer)

Writes directly to ES in each region. Advantages: short path, low latency. Drawbacks: weak multi‑region write success guarantees, limited bulk write performance, and difficulty ensuring ordered updates.

Solution 2: RocketMQ → Single‑Region ES + Cross‑Cluster Replication

DB changes are sent to RocketMQ, written to a single ES region, then replicated to other regions via ES’s cross‑cluster replication. Drawbacks: longer path, higher latency, and a single‑point‑of‑failure risk if the source region goes down.

Solution 3: RocketMQ + Flink → Multi‑Region ES (Chosen)

Multiple independent consumer groups consume from RocketMQ and, using Flink, write to ES in each region separately. This eliminates single‑point risk and leverages RocketMQ’s ordered key handling.

Why RocketMQ Guarantees Ordering

RocketMQ assigns messages with the same Sharding Key to the same Queue, ensuring FIFO order within that queue. Consumers process messages from a given Queue in strict order, which solves most out‑of‑order issues, though extreme cases (e.g., partition failures) still require reconciliation.

Multi‑Layer Reconciliation Mechanism

Three layers ensure eventual consistency between DB and ES:

Business Check Platform (BCP) – Seconds‑level Reconciliation : Listens to Binlog and directly compares ES across regions, detecting latency or blockage quickly.

Minute‑level Reconciliation : Periodically queries DB and ES without relying on intermediate components; triggers automatic compensation when mismatches are found.

T+1 Offline Reconciliation : Syncs DB and ES data to Hive nightly, validates day‑level consistency, and initiates compensation if needed.

These layers complement each other: BCP catches real‑time issues, minute‑level fills gaps where BCP cannot see, and offline reconciliation provides a safety net.

Current Capacity and Observations

After the first phase, the system supports tens of millions of product indices, handling 500‑1000 QPS per region for reads and around 500 QPS for writes. However, CPU spikes and query latency spikes have been observed, indicating future performance‑tuning work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

search engineElasticsearchRocketMQdata synchronization
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.