Databases 15 min read

Boost Multi-Dimensional Queries with Elasticsearch: From Basics to Advanced Integration

This article explains common multi‑dimensional query scenarios, compares basic indexing tricks and data heterogeneity approaches, then details how to integrate Elasticsearch with MySQL—including architecture, data import, Java client usage, code examples, and best‑practice recommendations—for scalable, near‑real‑time search.

Java Baker

Jun 4, 2022

Boost Multi-Dimensional Queries with Elasticsearch: From Basics to Advanced Integration

Background

There are two common multi‑dimensional query scenarios: list queries with multiple filter conditions and queries on dimensions that do not involve sharding columns. Traditional database queries struggle to meet these needs, especially for fuzzy search and full‑text retrieval.

Based on experience, we introduce a basic solution and an advanced solution (using Elasticsearch). In most cases, Elasticsearch is recommended for multi‑dimensional queries; readers in a hurry can jump to the advanced section.

Basic Approach

1. Add composite indexes for common query scenarios

Designed for list queries with multiple filter conditions.

Advantages

Very simple

Read‑write inconsistency window is short (typically milliseconds, depending on DB master‑slave sync delay)

Disadvantages

Highly limited: unless filter conditions are fixed, it is hard to handle new or changed filters

Adding an index for every new filter field can bloat the index and degrade performance

This often leads to a classic situation: product requests a new filter field, developers say it is impossible or costly, and the request is dropped.

2. Heterogeneous data copies

A more elegant method is to create multiple data copies for different dimensions (e.g., user‑centric for the front‑end, store‑centric for the back‑end, supplier‑centric for suppliers). One database can only shard by one dimension.

(1) Write to multiple data sources

Advantage: very simple.

Disadvantages

Cross‑database writes have consistency issues (unless different dimension tables share a common shard and use transactions); performance is low

Cannot flexibly support additional dimensions

(2) Use Canal for automatic data synchronization

Synchronize data via Canal to create multiple dimensional data sources. See the earlier article “Architect's Must‑Know: Using Canal for Asynchronous, Decoupled Architecture”.

Advantage: more elegant, no changes to the main program flow.

Still cannot solve constantly changing requirements; creating a new data copy for each new dimension is impractical.

Advanced Approach: Adding Elasticsearch to Existing System

System Architecture

Existing systems typically use MySQL for real‑time reads/writes and transactions. Elasticsearch (ES) is added to handle near‑real‑time multi‑dimensional queries, supporting hundreds of thousands of QPS depending on node, shard, and replica count. Data sync to ES incurs second‑level latency (mainly index refresh), while queries on indexed documents respond in milliseconds to a few hundred milliseconds.

Data Import

A synchronization mechanism imports MySQL data into ES. Main steps:

Pre‑define ES index mapping instead of relying on automatic mapping.

Initial full import, then incremental imports via Canal + MQ pipeline, requiring minimal code.

Filter out fields that are not needed for search to reduce index size and improve performance.

Flatten data: extract business fields from JSON columns to avoid nested types and boost performance.

Querying Data

From ES 8.x, Java API client is recommended and requires Java 8+ to leverage lambda expressions for clearer code.

For ES 7.x or earlier, or when using Java 7, continue using the high‑level REST client.

Code example (with detailed comments):

public class EsClientDemo {

    // demo: create client and perform a search
    public void createClientAndSearch() throws Exception {
        // Create low‑level REST client connecting to ES node on port 9200
        RestClient restClient = RestClient.builder(
            new HttpHost("localhost", 9200)).build();

        // Create transport with low‑level client and JSON mapper
        ElasticsearchTransport transport = new RestClientTransport(
            restClient, new JacksonJsonpMapper());

        // Core client for subsequent operations
        ElasticsearchClient esClient = new ElasticsearchClient(transport);

        // Multi‑condition search using fluent API and lambdas
        String searchText = "bike";
        String brand = "brandNew";
        Long maxPrice = 1000L;

        // Full‑text match query on product name
        Query byName = MatchQuery.of(m -> m
            .field("name")
            .query(searchText)
        )._toQuery();

        // Term query for exact brand match
        Query byBrand = new Query.Builder()
            .term(t -> t
                .field("brand")
                .value(v -> v.stringValue(brand))
            ).build();

        // Range query for price <= maxPrice
        Query byMaxPrice = RangeQuery.of(r -> r
            .field("price")
            .lte(JsonData.of(maxPrice))
        )._toQuery();

        // Execute search on the "products" index with combined bool must clauses
        SearchResponse<Product> response = esClient.search(s -> s
            .index("products")
            .query(q -> q
                .bool(b -> b
                    .must(byName)
                    .must(byBrand)
                    .must(byMaxPrice)
                )
            ),
            Product.class
        );

        // Iterate over hits
        List<Hit<Product>> hits = response.hits().hits();
        for (Hit<Product> hit : hits) {
            Product product = hit.source();
            logger.info("Found product " + product.getName() + ", score " + hit.score());
        }
    }
}

Reference: https://www.elastic.co/guide/en/elasticsearch/client/index.html

Data Model Conversion

Because both MySQL and ES exist, two heterogeneous data models are needed, along with conversion utilities:

MySQL data VO

ES data VO

Conversion tool between MySQL VO and ES VO

Business layer BO

Interface DTO

Underlying Principles

Elasticsearch outperforms MySQL for multi‑dimensional and full‑text queries due to different underlying data structures:

Elasticsearch inverted index

Full‑text fields are tokenized, creating term‑to‑document inverted indexes; queries are also tokenized, and relevance scoring (e.g., TF‑IDF) determines matches.

Exact‑value fields skip tokenization; the query term directly matches documents.

All fields generate inverted indexes, enabling multi‑dimensional combination queries.

MySQL B+ tree

Non‑leaf nodes store key ranges; leaf nodes store actual values in an ordered linked list.

Composite indexes must be explicitly created; if a query field is not indexed, the query may degrade to a full table scan.

Key points of ES architecture:

Nodes consist of master and data roles; each node can hold multiple shards, with primary and replica shards for high availability.

Number of primary shards is fixed at index creation; replicas can be added later to increase QPS.

Routing algorithm: document ID % number of primary shards (auto‑generated if not provided).

Pros and Cons

Advantages

Supports multi‑dimensional combination queries without fearing future field additions (cost lies in re‑indexing new fields).

Fully decoupled from existing systems, suitable for architectural evolution.

Handles far larger data volumes than MySQL, supporting petabyte‑scale storage and queries.

Disadvantages

Read‑write inconsistency window is in seconds due to data sync from MySQL to ES and the ES index refresh phase.

Common trick: delay the front‑end list query by about one second after a write.

Under extreme high concurrency, bottlenecks and stability issues arise; native ES supports roughly 30‑50k shards before index creation takes tens of seconds to minutes, and node count caps around 500, though most scenarios are still covered.

Elasticsearch Best Practices

Import only searchable data to keep the index size manageable.

Flatten data and avoid nested structures for better performance.

Define field types and mapping in advance rather than relying on automatic mapping.

Use keyword type for exact values and query with term.

Avoid routing‑free queries that hit many shards simultaneously, which can spike CPU usage.

Avoid deep pagination; use scroll API for large result sets.

Set appropriate filesystem cache size to improve query performance.

Primary shard count is immutable after creation; replicas can be increased to boost QPS, and shards can be rebalanced if a single shard becomes overloaded.

Further Steps

When ES reaches its concurrency limits and may encounter OOM, specialized C++ search engines are required for ultra‑high‑throughput scenarios, such as:

Baidu's general search engine for text and image queries.

E‑commerce‑specific search engines for product search, advertising, and recommendation based on keywords, brands, and price filters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Java Client Search Optimization database indexing Multi-dimensional query

Written by

Java Baker

Java architect and Raspberry Pi enthusiast, dedicated to writing high-quality technical articles; the same name is used across major platforms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.