Backend Development 19 min read

Implementation Approach for Query Separation Using Message Queues and Elasticsearch

This article explains the design and implementation of query separation, covering trigger mechanisms, data storage, synchronization via asynchronous threads or message queues, handling of MQ failures, idempotent consumption, ordering issues, and migration of historical data to an Elasticsearch-based query store.

IT Architects Alliance

Jul 20, 2022

Implementation Approach for Query Separation Using Message Queues and Elasticsearch

◆ Query Separation Implementation Approach

As shown in Figure 2‑2, the implementation ideas for query separation are presented.

Five key questions are raised:

How to trigger query separation?

How to implement query separation?

How is query data stored?

How is query data used?

How to migrate historical data?

• Figure 2‑2 Issues to consider for query separation

The following sections address the five questions.

◆ How to Trigger Query Separation

This question asks when a copy of data should be saved to the query database, i.e., when to trigger the separation.

Generally, three trigger logics exist:

1) Modify business code to synchronously update query data after writing the primary data. As shown in Figure 2‑3, each time a customer service agent clicks the update button, the request thread updates both the main data and the query data before returning a response.

• Figure 2‑3 Synchronous query data update by modifying business code

2) Modify business code to asynchronously update query data after writing the primary data. As shown in Figure 2‑4, the main thread updates the primary data and then fires an asynchronous thread to update the query database, returning the response immediately.

• Figure 2‑4 Asynchronous query data update by modifying business code

3) Monitor the primary database binlog; when a change is detected, trigger an update of the query data. This design does not affect business code. Figure 2‑5 illustrates the flow.

• Figure 2‑5 Updating query data by monitoring database logs

The advantages and disadvantages of the three trigger logics are summarized in Table 2‑1.

• Table 2‑1 Pros and cons of the three trigger logics

Key concepts are explained:

Business logic flexibility: developers can easily decide when to update query data, whereas log monitoring cannot enumerate all possible changes.

Write‑operation slowdown: updating query data may involve heavy operations such as index rebuilding, turning a 2 ms write into a 1 s operation.

Stale reads: if the query data update lags, users may see outdated information.

Based on Table 2‑2, the three trigger logics are matched to suitable scenarios. In the discussed project, the team chose the asynchronous approach (method 2) because customer‑service agents require fast write responses and the team is familiar with the business code.

• Table 2‑2 Applicable scenarios for each trigger logic

◆ How to Implement Query Separation

The team adopts the asynchronous trigger (method 2). The basic implementation spawns a separate thread to create query data, but several concerns arise:

Too many concurrent threads can exhaust the JVM.

Thread failures require automatic retry and a way to mark failed updates.

Concurrency control is needed for many parallel threads.

Message queues (MQ) can address these issues. Each write request sends a notification to the MQ; the MQ awakens a consumer thread to update the query data (Figure 2‑6).

• Figure 2‑6 MQ‑driven query data update flow

Five MQ‑related questions are discussed:

MQ selection – if the company already uses an MQ, reuse it; otherwise, evaluate options (RabbitMQ, RocketMQ, Kafka, ActiveMQ, Redis) based on ease of use and language support.

MQ downtime – handle message loss and duplicate delivery by using a simple flag NeedUpdateQueryData=true in the primary data and batch‑processing pending updates after recovery.

Thread failure – retry by re‑checking the flag; optionally track retry counts.

Idempotent consumption – ensure that if a consumer crashes after updating the query store, the flag is already cleared, preventing duplicate updates.

Message ordering – store last_update_time with each record; after a thread finishes, verify that the record’s timestamp matches the one it started with before clearing the flag.

Using MQ also decouples services and allows throttling of query‑update workers.

◆ How Query Data Is Stored

For large‑scale search, Elasticsearch is the primary choice, though MongoDB or HBase could be considered. In this project, Elasticsearch was selected because the team is familiar with it and it fits the complex query requirements.

◆ How Query Data Is Used

After data is indexed in Elasticsearch, business code calls Elasticsearch’s API directly for queries. Because synchronization may lag (e.g., 2 seconds), users might see stale data. Two mitigation strategies are suggested:

Block queries until the latest data is indexed (rarely used).

Show a notice that the displayed data may be up to a few seconds old and advise a refresh.

◆ Historical Data Migration

To bring legacy data into the new architecture, set NeedUpdateQueryData=true for all existing records; the system will automatically migrate them.

◆ Overall MQ + Elasticsearch Solution

The complete solution consists of:

Asynchronous triggering of query data sync after a ticket update.

MQ to decouple services and throttle sync load.

Storing query data in Elasticsearch for scalable, complex searches.

User notifications about possible data staleness.

Automatic migration of historical data by flagging all old records.

• Figure 2‑8 Overall solution diagram

The next article will discuss practical considerations when using Elasticsearch.

Benefits

Join the IT architects and senior engineers community for architecture knowledge, technical articles, case studies, and solutions. Scan the QR code to join.

Keywords for free resources (reply to the public account): "Architecture" for e‑books, "Practice" for case studies, "Docker" for Docker docs, "Planning" for architecture planning, "Huawei" for HarmonyOS materials.

Disclaimer: The shared materials are collected from the Internet, copyright belongs to the original authors, and the content reflects personal views only. Please verify independently; contact the admin for removal if infringement occurs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Data synchronization historical data migration query separation

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

◆ Query Separation Implementation Approach

◆ How to Trigger Query Separation

◆ How to Implement Query Separation

◆ How Query Data Is Stored

◆ How Query Data Is Used

◆ Historical Data Migration

◆ Overall MQ + Elasticsearch Solution

IT Architects Alliance

How this landed with the community

Was this worth your time?

0 Comments

◆ Overall MQ + Elasticsearch Solution