Boost Multi-Dimensional Queries with Elasticsearch: From Basics to Advanced Integration
This article explains common multi‑dimensional query scenarios, compares basic indexing tricks and data heterogeneity approaches, then details how to integrate Elasticsearch with MySQL—including architecture, data import, Java client usage, code examples, and best‑practice recommendations—for scalable, near‑real‑time search.
Background
There are two common multi‑dimensional query scenarios: list queries with multiple filter conditions and queries on dimensions that do not involve sharding columns. Traditional database queries struggle to meet these needs, especially for fuzzy search and full‑text retrieval.
Based on experience, we introduce a basic solution and an advanced solution (using Elasticsearch). In most cases, Elasticsearch is recommended for multi‑dimensional queries; readers in a hurry can jump to the advanced section.
Basic Approach
1. Add composite indexes for common query scenarios
Designed for list queries with multiple filter conditions.
Advantages
Very simple
Read‑write inconsistency window is short (typically milliseconds, depending on DB master‑slave sync delay)
Disadvantages
Highly limited: unless filter conditions are fixed, it is hard to handle new or changed filters
Adding an index for every new filter field can bloat the index and degrade performance
This often leads to a classic situation: product requests a new filter field, developers say it is impossible or costly, and the request is dropped.
2. Heterogeneous data copies
A more elegant method is to create multiple data copies for different dimensions (e.g., user‑centric for the front‑end, store‑centric for the back‑end, supplier‑centric for suppliers). One database can only shard by one dimension.
(1) Write to multiple data sources
Advantage: very simple.
Disadvantages
Cross‑database writes have consistency issues (unless different dimension tables share a common shard and use transactions); performance is low
Cannot flexibly support additional dimensions
(2) Use Canal for automatic data synchronization
Synchronize data via Canal to create multiple dimensional data sources. See the earlier article “Architect's Must‑Know: Using Canal for Asynchronous, Decoupled Architecture”.
Advantage: more elegant, no changes to the main program flow.
Still cannot solve constantly changing requirements; creating a new data copy for each new dimension is impractical.
Advanced Approach: Adding Elasticsearch to Existing System
System Architecture
Existing systems typically use MySQL for real‑time reads/writes and transactions. Elasticsearch (ES) is added to handle near‑real‑time multi‑dimensional queries, supporting hundreds of thousands of QPS depending on node, shard, and replica count. Data sync to ES incurs second‑level latency (mainly index refresh), while queries on indexed documents respond in milliseconds to a few hundred milliseconds.
Data Import
A synchronization mechanism imports MySQL data into ES. Main steps:
Pre‑define ES index mapping instead of relying on automatic mapping.
Initial full import, then incremental imports via Canal + MQ pipeline, requiring minimal code.
Filter out fields that are not needed for search to reduce index size and improve performance.
Flatten data: extract business fields from JSON columns to avoid nested types and boost performance.
Querying Data
From ES 8.x, Java API client is recommended and requires Java 8+ to leverage lambda expressions for clearer code.
For ES 7.x or earlier, or when using Java 7, continue using the high‑level REST client.
Code example (with detailed comments):
public class EsClientDemo {
// demo: create client and perform a search
public void createClientAndSearch() throws Exception {
// Create low‑level REST client connecting to ES node on port 9200
RestClient restClient = RestClient.builder(
new HttpHost("localhost", 9200)).build();
// Create transport with low‑level client and JSON mapper
ElasticsearchTransport transport = new RestClientTransport(
restClient, new JacksonJsonpMapper());
// Core client for subsequent operations
ElasticsearchClient esClient = new ElasticsearchClient(transport);
// Multi‑condition search using fluent API and lambdas
String searchText = "bike";
String brand = "brandNew";
Long maxPrice = 1000L;
// Full‑text match query on product name
Query byName = MatchQuery.of(m -> m
.field("name")
.query(searchText)
)._toQuery();
// Term query for exact brand match
Query byBrand = new Query.Builder()
.term(t -> t
.field("brand")
.value(v -> v.stringValue(brand))
).build();
// Range query for price <= maxPrice
Query byMaxPrice = RangeQuery.of(r -> r
.field("price")
.lte(JsonData.of(maxPrice))
)._toQuery();
// Execute search on the "products" index with combined bool must clauses
SearchResponse<Product> response = esClient.search(s -> s
.index("products")
.query(q -> q
.bool(b -> b
.must(byName)
.must(byBrand)
.must(byMaxPrice)
)
),
Product.class
);
// Iterate over hits
List<Hit<Product>> hits = response.hits().hits();
for (Hit<Product> hit : hits) {
Product product = hit.source();
logger.info("Found product " + product.getName() + ", score " + hit.score());
}
}
}Reference: https://www.elastic.co/guide/en/elasticsearch/client/index.html
Data Model Conversion
Because both MySQL and ES exist, two heterogeneous data models are needed, along with conversion utilities:
MySQL data VO
ES data VO
Conversion tool between MySQL VO and ES VO
Business layer BO
Interface DTO
Underlying Principles
Elasticsearch outperforms MySQL for multi‑dimensional and full‑text queries due to different underlying data structures:
Elasticsearch inverted index
Full‑text fields are tokenized, creating term‑to‑document inverted indexes; queries are also tokenized, and relevance scoring (e.g., TF‑IDF) determines matches.
Exact‑value fields skip tokenization; the query term directly matches documents.
All fields generate inverted indexes, enabling multi‑dimensional combination queries.
MySQL B+ tree
Non‑leaf nodes store key ranges; leaf nodes store actual values in an ordered linked list.
Composite indexes must be explicitly created; if a query field is not indexed, the query may degrade to a full table scan.
Key points of ES architecture:
Nodes consist of master and data roles; each node can hold multiple shards, with primary and replica shards for high availability.
Number of primary shards is fixed at index creation; replicas can be added later to increase QPS.
Routing algorithm: document ID % number of primary shards (auto‑generated if not provided).
Pros and Cons
Advantages
Supports multi‑dimensional combination queries without fearing future field additions (cost lies in re‑indexing new fields).
Fully decoupled from existing systems, suitable for architectural evolution.
Handles far larger data volumes than MySQL, supporting petabyte‑scale storage and queries.
Disadvantages
Read‑write inconsistency window is in seconds due to data sync from MySQL to ES and the ES index refresh phase.
Common trick: delay the front‑end list query by about one second after a write.
Under extreme high concurrency, bottlenecks and stability issues arise; native ES supports roughly 30‑50k shards before index creation takes tens of seconds to minutes, and node count caps around 500, though most scenarios are still covered.
Elasticsearch Best Practices
Import only searchable data to keep the index size manageable.
Flatten data and avoid nested structures for better performance.
Define field types and mapping in advance rather than relying on automatic mapping.
Use keyword type for exact values and query with term.
Avoid routing‑free queries that hit many shards simultaneously, which can spike CPU usage.
Avoid deep pagination; use scroll API for large result sets.
Set appropriate filesystem cache size to improve query performance.
Primary shard count is immutable after creation; replicas can be increased to boost QPS, and shards can be rebalanced if a single shard becomes overloaded.
Further Steps
When ES reaches its concurrency limits and may encounter OOM, specialized C++ search engines are required for ultra‑high‑throughput scenarios, such as:
Baidu's general search engine for text and image queries.
E‑commerce‑specific search engines for product search, advertising, and recommendation based on keywords, brands, and price filters.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Baker
Java architect and Raspberry Pi enthusiast, dedicated to writing high-quality technical articles; the same name is used across major platforms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
