Big Data 10 min read

Optimizing Elasticsearch Mapping to Reduce High CPU Usage: Challenges, Solutions, and Results

By refactoring the station‑profile index to eliminate over‑indexed and mis‑typed fields—cutting complex types from 282 to 74 and keywords from 67 to 59—the team lowered CPU peaks from 60 % to 50 %, reduced average CPU to 20 %, cut query latency to 150 ms, accelerated Flink sync to 10 minutes, and decommissioned two nodes, achieving substantial performance gains and cost savings.

HelloTech

Sep 20, 2024

Optimizing Elasticsearch Mapping to Reduce High CPU Usage: Challenges, Solutions, and Results

Business Background : The team needed to improve the performance of a two‑round intelligent scheduling engine that relies on algorithmic and manual interventions to allocate shared bikes across stations, maximizing global bike revenue. Station profiles serve as the foundational data for the scheduling core, required both offline and online with strict latency (<50 ms) for user‑facing scenarios.

Problem Decomposition : High CPU usage in the station‑profile cluster caused two main issues: (1) delayed Flink data‑sync tasks, which affect model‑driven scheduling quality, and (2) excessive query latency for C‑end “red‑packet” bike recommendations, where response time must stay under 50 ms.

Attempts Made : - Introduced Redis caching for low‑latency geohash‑based recall (noted precision loss due to rectangular geohash). - Adopted HBase for point‑lookup scenarios, later abandoned due to instability. - Explored various mapping‑reduction strategies over a year of trial and error.

Technical Challenges : 1. Mapping Over‑Indexing – 282 complex‑type fields + 67 keyword fields caused heavy CPU load. 2. Unnecessary Field Mappings – Fields not needed for full‑text search or aggregation still consumed indexing resources. 3. Misused Data Types – Numeric fields used with TermQuery (slow) instead of appropriate keyword or point types, leading to large bitset generation during query execution.

Why Mapping Affects Cluster CPU : - Frequent segment creation and merge operations increase file handles, memory, and CPU cycles; each segment must be scanned per search. - Unnecessary mappings add extra indexing work. - Incorrect data types force inefficient query paths (e.g., numeric TermQuery → PointRangeQuery → costly bitset construction).

Correct Mapping Selection : Choose mapping types based on query scenarios (refer to Elasticsearch keyword documentation). For numeric fields, prefer point‑type structures (Block k‑d tree) introduced in ES 6.x for efficient range queries.

Zero‑Downtime Mapping Refactor : - Identify essential search fields from core workflow. - Apply a three‑step rollout: gray release, rollback capability, and monitoring alerts for missing indices. - Ensure coverage of >10 business search requirements across >300 original fields.

Results (Effect Recovery) : - Mapping reduced from 282 to 74 complex types and from 67 to 59 keywords. - Corrected 10 misused numeric fields (integer → keyword). - Cluster CPU peak dropped from ~60 % to ~50 % and daily average from 60 % to 20 %. - Flink sync latency improved from 1.5 h to 10 min. - Query latency reduced from 1 s to 150 ms peak, average 14.5 ms. - Two nodes decommissioned (30 → 28), saving ~¥16,000 annually.

References : Elastic blog posts on Elasticsearch fundamentals, merge process, Lucene points, and Chinese Elasticsearch articles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing Search Engine Elasticsearch CPU performance Mapping Optimization

Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.