Big Data 12 min read

Design and Performance Optimization of an Intelligent Search System for City Operations Big Data Center

This article describes the background, requirement‑driven prototype design, Elasticsearch‑based query‑DSL selection, and extensive performance tuning—including hardware configuration, indexing parameters, JVM and garbage‑collector adjustments—that enabled real‑time ingestion of hundreds of thousands of records and sub‑second search responses for a city‑wide data platform.

Zhengtong Technical Team

Apr 30, 2020

Design and Performance Optimization of an Intelligent Search System for City Operations Big Data Center

The city operations data platform must retrieve and analyze massive multidimensional datasets efficiently, overcoming the complexity of traditional query interfaces. An intelligent search strategy was introduced to simplify the workflow and quickly return the most relevant information.

Guided by user needs, the prototype emphasizes ease of use, reduced interaction steps, comprehensive resource navigation, and visualized result classification.

Elasticsearch (ES) was chosen as the core engine because it provides distributed full‑text search, rich analyzers, and an active community. Various ES query types were evaluated:

MatchQuery – the most common full‑text query.

Limited flexibility; strict/partial matching can reduce precision.

copy_to + MatchQuery/QueryStringQuery – unsuitable for field‑level permission constraints.

Match_Phrase Query – strict phrase matching, useful for precise short‑phrase searches.

Multi‑Match Query – supports multiple fields with best_fields, most_fields, cross_fields options.

Query_String Query – flexible syntax (regex, range, fuzzy) and was ultimately selected as the main search method.

Performance optimization focused on both write throughput and real‑time query latency. Hardware consisted of dedicated master nodes and data nodes, with dual‑disk data paths. Data volume involved 80 fields (≈1.33 KB per record) and batch writes of 1 000–20 000 documents.

Key tuning actions (shown in the original code snippets) included reducing concurrent disk I/O threads, increasing translog segment size, adjusting refresh intervals, distributing data across multiple disks, enlarging indexing buffers, disabling swap, allocating half of system memory to ES, switching to the G1 garbage collector, lowering slow‑log levels, enabling TCP compression, expanding field‑data cache, and setting memory‑breaker limits.

#机械硬盘的并发IO性能较差，我们需要减少每个索引并发访问磁盘的线程数。
index.merge.scheduler.max_thread_count: 1

#增大这个参数可以允许translog在flush前存放更大的段(segment);
index.translog.flush_threshold_size：1024

#索引刷新的频率默认1s，改为5s以降低IO压力。
index.refresh_interval：5s

#使用两块磁盘并行写入。
path.data: /egova_data1,/egova_data2

Additional JVM and ES settings were applied to further reduce GC pauses and I/O overhead:

#关闭swap以防止CPU等待IO。
swapoff -a

#分配16G堆内存，留出一半给系统。
-Xms16g -Xmx16g

#使用G1垃圾回收器。
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

#降低慢查询日志级别。
logger.index_search_slowlog_rolling.level=info
logger.index_indexing_slowlog.level=info

#锁定内存，防止写入swap。
bootstrap.memory_lock: true
bootstrap.system_call_filter: false

#启用TCP压缩。
transport.tcp.compress: true

#扩大fielddata缓存至30%。
Indices.fielddata.cache.size: 30%

#设置字段数据断路器上限为60%。
indices.breaker.fielddata.limit: 60%

Benchmark results demonstrated that batch sizes of 5–15 MB achieve the highest throughput, SSD storage dramatically improves write speed, and appropriate shard sizing (≈30‑40 GB per shard) balances resource usage.

In conclusion, the optimized ES cluster now supports billion‑scale data ingestion with second‑level query latency, delivering an intuitive, high‑accuracy, one‑stop intelligent search experience for city management users.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data Elasticsearch Intelligent Search Query DSL Cluster Tuning

Written by

Zhengtong Technical Team

How do 700+ nationwide projects deliver quality service? What inspiring stories lie behind dozens of product lines? Where is the efficient solution for tens of thousands of customer needs each year? This is Zhengtong Digital's technical practice sharing—a bridge connecting engineers and customers!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.