Databases 10 min read

Boost Large Table Queries with Query Separation: When and How to Implement

This article explains query separation as a strategy to accelerate slow large‑table queries by duplicating data to a dedicated query store, outlines when to adopt it, compares synchronous, asynchronous, and binlog approaches, discusses storage choices such as MongoDB, HBase, Elasticsearch, and addresses consistency and MQ challenges.

Su San Talks Tech

May 18, 2024

Boost Large Table Queries with Query Separation: When and How to Implement

What Is Query Separation?

Query separation involves writing data to a primary store and simultaneously saving a copy to a separate storage system that is used exclusively for read operations, as illustrated below.

Key questions include when to trigger query separation, how to implement it, which storage system to use for query data, and how the query data is accessed.

Applicable Scenarios for Query Separation

Large data volumes

Write request latency is acceptable

Read request latency is poor

Data may be modified at any time

Business requires optimized query performance

In a SaaS ticket‑system with tens of millions of rows and many joined tables, traditional cold‑hot separation still resulted in query times of dozens of seconds despite indexing and SQL tuning.

By moving frequently updated data to a primary database and routing read‑only queries to a separate query store, response time dropped to under 500 ms.

When to Trigger Query Separation?

Three common approaches are:

Synchronous creation – write to primary and immediately create the query copy.

Asynchronous creation – write to primary, then create the query copy later.

Binlog‑based creation – listen to database binlog events and build the query copy without modifying application code.

Synchronous Creation

Modify business code to create the query copy immediately after writing primary data.

Advantages : Guarantees consistency and real‑time freshness of query data.

Disadvantages : Increases code intrusion and can slow write operations.

Asynchronous Creation

Modify business code to enqueue a task after writing primary data; a background worker later creates the query copy.

Advantages : Does not affect the main transaction flow.

Disadvantages : May introduce data consistency gaps.

Binlog‑Based Creation

Listen to database binlog events to build the query copy without any code changes.

Advantages : Zero code intrusion and no impact on the primary flow.

Disadvantages : Consistency issues may arise and the architecture is more complex.

How to Implement Query Separation?

For asynchronous approaches, in‑memory buffering is possible but limited by memory size and volatility. Therefore, using a message queue (MQ) is recommended.

When selecting an MQ, consider:

If your organization already uses an MQ, continue with it.

If not, evaluate options based on latency, durability, and operational overhead (see related article on MQ selection).

Handling MQ Failures

If the MQ crashes, mark unprocessed records with a flag (e.g., “migrated” vs. “pending”). Once the MQ recovers, process the pending records.

“The appropriate solution depends on the actual business situation.”

Ensuring Idempotent Consumption

Guarantee that each change is applied only once to avoid duplicate entries in the query store.

Maintaining Message Order

When multiple updates occur for the same entity, ensure that later updates are not overwritten by earlier ones that finish later.

Choosing a Storage System for Query Data

Relational databases may not scale for massive read workloads. Options include:

MongoDB

HBase

Elasticsearch

Select the system you are already familiar with or that aligns with existing infrastructure; for example, Elasticsearch was chosen for its query extensibility and team expertise.

Using the Query Data

Each database provides its own API for read operations. Two common strategies to handle temporary inconsistency between primary and query stores are:

Block reads until the query store is up‑to‑date (rarely used in practice).

Show a user notice that the data may be up to one second stale and suggest a refresh.

Summary

Query separation can dramatically improve query performance for large tables, but it is not a universal remedy; write latency and historical data migration remain challenges that require further discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch database optimization Message Queue MongoDB Asynchronous Replication query separation

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.