Boost Large Table Queries with Query Separation: When and How to Implement
This article explains query separation as a strategy to accelerate slow large‑table queries by duplicating data to a dedicated query store, outlines when to adopt it, compares synchronous, asynchronous, and binlog approaches, discusses storage choices such as MongoDB, HBase, Elasticsearch, and addresses consistency and MQ challenges.
What Is Query Separation?
Query separation involves writing data to a primary store and simultaneously saving a copy to a separate storage system that is used exclusively for read operations, as illustrated below.
Key questions include when to trigger query separation, how to implement it, which storage system to use for query data, and how the query data is accessed.
Applicable Scenarios for Query Separation
Large data volumes
Write request latency is acceptable
Read request latency is poor
Data may be modified at any time
Business requires optimized query performance
In a SaaS ticket‑system with tens of millions of rows and many joined tables, traditional cold‑hot separation still resulted in query times of dozens of seconds despite indexing and SQL tuning.
By moving frequently updated data to a primary database and routing read‑only queries to a separate query store, response time dropped to under 500 ms.
When to Trigger Query Separation?
Three common approaches are:
Synchronous creation – write to primary and immediately create the query copy.
Asynchronous creation – write to primary, then create the query copy later.
Binlog‑based creation – listen to database binlog events and build the query copy without modifying application code.
Synchronous Creation
Modify business code to create the query copy immediately after writing primary data.
Advantages : Guarantees consistency and real‑time freshness of query data.
Disadvantages : Increases code intrusion and can slow write operations.
Asynchronous Creation
Modify business code to enqueue a task after writing primary data; a background worker later creates the query copy.
Advantages : Does not affect the main transaction flow.
Disadvantages : May introduce data consistency gaps.
Binlog‑Based Creation
Listen to database binlog events to build the query copy without any code changes.
Advantages : Zero code intrusion and no impact on the primary flow.
Disadvantages : Consistency issues may arise and the architecture is more complex.
How to Implement Query Separation?
For asynchronous approaches, in‑memory buffering is possible but limited by memory size and volatility. Therefore, using a message queue (MQ) is recommended.
When selecting an MQ, consider:
If your organization already uses an MQ, continue with it.
If not, evaluate options based on latency, durability, and operational overhead (see related article on MQ selection).
Handling MQ Failures
If the MQ crashes, mark unprocessed records with a flag (e.g., “migrated” vs. “pending”). Once the MQ recovers, process the pending records.
“The appropriate solution depends on the actual business situation.”
Ensuring Idempotent Consumption
Guarantee that each change is applied only once to avoid duplicate entries in the query store.
Maintaining Message Order
When multiple updates occur for the same entity, ensure that later updates are not overwritten by earlier ones that finish later.
Choosing a Storage System for Query Data
Relational databases may not scale for massive read workloads. Options include:
MongoDB
HBase
Elasticsearch
Select the system you are already familiar with or that aligns with existing infrastructure; for example, Elasticsearch was chosen for its query extensibility and team expertise.
Using the Query Data
Each database provides its own API for read operations. Two common strategies to handle temporary inconsistency between primary and query stores are:
Block reads until the query store is up‑to‑date (rarely used in practice).
Show a user notice that the data may be up to one second stale and suggest a refresh.
Summary
Query separation can dramatically improve query performance for large tables, but it is not a universal remedy; write latency and historical data migration remain challenges that require further discussion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
