Boost Large Table Reads with Query Separation
This article explains the concept of query separation, its suitable scenarios, implementation methods—including synchronous, asynchronous, and binlog approaches—storage options, MQ considerations, and how it can dramatically improve query performance for massive tables.
What Is Query Separation?
Query separation means writing data to the primary store while simultaneously saving a copy to another storage system, from which queries are served, as illustrated in the diagram below.
Key details to explore include when to trigger query separation, how to implement it, storage system selection, and usage.
When to Use Query Separation?
Consider query separation when:
Data volume is large.
Write request performance is acceptable.
Query request performance is poor.
All data may be modified at any time.
The business wants optimized query functionality.
Example: a SaaS customer service system with tens of millions of tickets and dozens of related tables suffered seconds‑long query times despite indexing and SQL tuning. By separating writes and queries, query latency dropped to under 500 ms.
Triggering Query Separation
Triggering involves deciding when to copy data to the query store, essentially data heterogenization. Three common methods are:
Synchronous creation.
Asynchronous creation.
Binlog‑based creation.
1. Synchronous Creation
Modify business code to synchronously create query data after each write.
Advantages: Guarantees consistency and real‑time query data.
Disadvantages: Strong code intrusion and slower write performance.
2. Asynchronous Creation
Modify code to write data first, then asynchronously build query data.
Advantages: Does not affect the main flow.
Disadvantages: Potential data consistency issues.
3. Binlog‑Based Creation
This non‑intrusive approach listens to database binlog events to build query data.
Advantages: No impact on the main flow; zero code intrusion.
Disadvantages: Data consistency challenges and more complex architecture.
How to Implement Query Separation?
For asynchronous implementation, data can be staged in memory, but this has drawbacks such as limited memory and loss on service restart. Therefore, using a message queue (MQ) is recommended.
MQ selection tips:
If your company already uses an MQ, continue with it.
If not, evaluate options; see my previous article on MQ selection.
When MQ fails, mark records with a flag (e.g., "migrated" or "not migrated") and replay after recovery.
“The solution varies based on actual business needs.”
Ensure idempotent consumption to avoid duplicate data, and handle message ordering to prevent stale query data.
Choosing a Storage System for Query Data
Relational databases may not scale for massive query loads. Alternatives include:
MongoDB
HBase
Elasticsearch
Select based on existing infrastructure and team expertise; for example, we chose Elasticsearch for its query extensibility and our familiarity.
Using Query Data
Each storage system provides its own API for querying. Two approaches to handle potential inconsistency before updates are applied:
Block queries until the latest data is available (rarely used).
Show a notice that data may be up to one second old and suggest a refresh.
Summary
This article introduced query separation as a solution for slow queries on large tables. While effective, it is not a silver bullet; challenges such as write performance and historical data migration remain and will be covered in future posts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
