Practical Insights on Deploying and Operating Elasticsearch at Scale
This article shares extensive practical experience from Qunar's large‑scale Elasticsearch deployment, covering suitable use cases, index‑type design, document ID strategies, scaling considerations for index and data volume, hardware sizing, and storage architecture recommendations to help newcomers avoid common pitfalls.
Since 2014, Qunar has progressively adopted the ELK stack to address log retrieval, data analysis, and operational backend needs, now running over 100 Elasticsearch clusters—one exceeding 400 nodes and storing more than 1.3 PB of data, processing tens of billions of records daily via Logstash.
Applicable Scenarios : Elasticsearch excels at search and analytics rather than serving as a schemaless transactional database; its near‑real‑time (NRT) nature lacks immediate write visibility and atomic multi‑document operations, making it ideal for backup of database binlogs and multi‑dimensional queries.
Index vs. Type : Unlike relational databases, Elasticsearch’s index/type hierarchy does not provide strong isolation. Types are a logical layer added by Elasticsearch and can cause issues such as write failures, result ordering problems, and data bloat in older versions. From version 6.0/5.5 onward, multiple types are deprecated and removed in 7.0.
Designing Indices : For order data, a recommended single‑type layout uses daily index partitions (e.g., /order_center-2018.11.18/data/) to keep individual index sizes manageable and simplify archiving, closing, or snapshot operations. Smaller daily volumes may be merged into weekly or monthly indices combined with type queries, though this can delay configuration changes.
Document ID Selection : In clusters with high write throughput (5.x+), autogenerated IDs are preferred to avoid shard‑level uniqueness checks. When custom IDs are required, use fixed‑length, zero‑padded binary strings to improve query performance.
Index vs. Data Volume : The number of indices impacts Master node memory (risking frequent GC), while total data volume affects Data node memory due to in‑memory inverted indexes. Monitoring both dimensions helps anticipate scaling needs and prevents instability.
Hardware Sizing – Disk vs. Memory : At Qunar, a 1 TB index consumes roughly 0.9 GB of heap memory. With a 15 TB data disk, about 13.5 GB of heap is needed, suggesting a JVM heap of 30 GB and a minimum of 64 GB RAM. Larger disks (e.g., 50 TB) may require 128 GB+ RAM or multiple JVM instances.
Single Disk vs. Multiple Disks : From Elasticsearch 2.x onward, shard data is stored on a single disk to improve data completeness. Centralized index creation (daily/weekly/monthly) can cause the largest disk to receive many shards, leading to I/O spikes; thus, multi‑disk (RAID) setups are discouraged for such patterns.
Overall, the article provides actionable guidelines for Elasticsearch deployment, emphasizing proper use‑case identification, index design, ID strategy, capacity planning, and storage architecture to ensure stable, high‑performance search clusters.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
