Big Data 10 min read

Optimizing Elasticsearch Mapping to Reduce High CPU Usage: Challenges, Solutions, and Results

By refactoring the station‑profile index to eliminate over‑indexed and mis‑typed fields—cutting complex types from 282 to 74 and keywords from 67 to 59—the team lowered CPU peaks from 60 % to 50 %, reduced average CPU to 20 %, cut query latency to 150 ms, accelerated Flink sync to 10 minutes, and decommissioned two nodes, achieving substantial performance gains and cost savings.

HelloTech
HelloTech
HelloTech
Optimizing Elasticsearch Mapping to Reduce High CPU Usage: Challenges, Solutions, and Results

Business Background : The team needed to improve the performance of a two‑round intelligent scheduling engine that relies on algorithmic and manual interventions to allocate shared bikes across stations, maximizing global bike revenue. Station profiles serve as the foundational data for the scheduling core, required both offline and online with strict latency (<50 ms) for user‑facing scenarios.

Problem Decomposition : High CPU usage in the station‑profile cluster caused two main issues: (1) delayed Flink data‑sync tasks, which affect model‑driven scheduling quality, and (2) excessive query latency for C‑end “red‑packet” bike recommendations, where response time must stay under 50 ms.

Attempts Made : - Introduced Redis caching for low‑latency geohash‑based recall (noted precision loss due to rectangular geohash). - Adopted HBase for point‑lookup scenarios, later abandoned due to instability. - Explored various mapping‑reduction strategies over a year of trial and error.

Technical Challenges : 1. Mapping Over‑Indexing – 282 complex‑type fields + 67 keyword fields caused heavy CPU load. 2. Unnecessary Field Mappings – Fields not needed for full‑text search or aggregation still consumed indexing resources. 3. Misused Data Types – Numeric fields used with TermQuery (slow) instead of appropriate keyword or point types, leading to large bitset generation during query execution.

Why Mapping Affects Cluster CPU : - Frequent segment creation and merge operations increase file handles, memory, and CPU cycles; each segment must be scanned per search. - Unnecessary mappings add extra indexing work. - Incorrect data types force inefficient query paths (e.g., numeric TermQuery → PointRangeQuery → costly bitset construction).

Correct Mapping Selection : Choose mapping types based on query scenarios (refer to Elasticsearch keyword documentation). For numeric fields, prefer point‑type structures (Block k‑d tree) introduced in ES 6.x for efficient range queries.

Zero‑Downtime Mapping Refactor : - Identify essential search fields from core workflow. - Apply a three‑step rollout: gray release, rollback capability, and monitoring alerts for missing indices. - Ensure coverage of >10 business search requirements across >300 original fields.

Results (Effect Recovery) : - Mapping reduced from 282 to 74 complex types and from 67 to 59 keywords. - Corrected 10 misused numeric fields (integer → keyword). - Cluster CPU peak dropped from ~60 % to ~50 % and daily average from 60 % to 20 %. - Flink sync latency improved from 1.5 h to 10 min. - Query latency reduced from 1 s to 150 ms peak, average 14.5 ms. - Two nodes decommissioned (30 → 28), saving ~¥16,000 annually.

References : Elastic blog posts on Elasticsearch fundamentals, merge process, Lucene points, and Chinese Elasticsearch articles.

Big DataIndexingsearch engineElasticsearchCPU performanceMapping Optimization
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.