Query Separation: Optimizing Large Table Reads by Decoupling Query Data
The article explains query separation as a technique to improve performance of massive business tables by storing write‑optimized data and query‑optimized data separately, discusses when to apply it, various implementation methods, storage options, and operational considerations such as MQ handling and consistency.
What Is Query Separation?
Query separation means saving a backup of the data to another storage system when writing, and reading directly from that storage for queries, as illustrated in the diagram.
The simple architecture hides details that need deeper analysis, such as when to trigger query separation, how to implement it, storage system selection, and usage.
Applicable Scenarios for Query Separation
Consider using query separation when:
Data volume is large.
Write request performance is acceptable.
Query request performance is poor.
Data may be modified at any time.
The business wants to optimize query functionality.
Example: a SaaS customer‑service system with tens of millions of tickets and dozens of related tables, where traditional indexing and SQL tuning still resulted in seconds‑long query times.
By separating write‑side and query‑side databases, update speed improved dramatically and query latency dropped to under 500 ms.
When To Trigger Query Separation?
Triggering occurs when a copy of the data should be saved to a query database, i.e., during data heterogeneity. Three common approaches are:
Synchronous creation.
Asynchronous creation.
Binlog‑based creation.
1. Synchronous Creation
Modify business code to create the query record immediately after writing the primary data.
Advantages: Guarantees consistency and real‑time availability of query data.
Disadvantages: Strong code intrusion and slower write performance.
2. Asynchronous Creation
After writing data, an asynchronous process creates the query record.
Advantages: Does not affect the main workflow.
Disadvantages: Potential data consistency issues.
3. Binlog‑Based Creation
This widely used, non‑intrusive method listens to database binlog events to build query data.
Advantages: No impact on the main process; zero code intrusion.
Disadvantages: Consistency challenges and more complex architecture.
How To Implement Query Separation?
The three methods above are common; the synchronous method is straightforward and omitted here. The binlog method is described in a previous article on data heterogeneity.
For asynchronous implementation, in‑memory buffering can be used but has drawbacks such as memory limits and data loss on service restart. Therefore, using a message queue (MQ) is recommended.
MQ considerations include:
If the company already uses an MQ, continue with it.
If not, evaluate options (see the linked article on MQ selection).
MQ Failure Handling
If MQ crashes, query data creation stops. Adding a flag (migrated / not migrated) to each record allows the system to resume creation after MQ recovery.
Idempotent Consumption
Ensure messages are processed idempotently to avoid duplicate query records.
Message Ordering
Out‑of‑order processing can cause stale query data; proper sequencing logic is required.
Choosing a Storage System For Query Data
Relational databases are unsuitable for massive query workloads. Options include:
MongoDB
HBase
Elasticsearch
Selection depends on existing infrastructure and team expertise; the author chose Elasticsearch for its query extensibility and familiarity.
Using Query Data
Each database provides an API for querying. Two strategies address the inconsistency window before query data is updated:
Block user queries until the latest data is available (rarely used).
Show a notice that data may be up to one second old and suggest a refresh.
Summary
This article presents query separation as a solution for slow queries on large tables. While effective, it is not a silver bullet; challenges such as write performance and historical data migration remain for future discussion.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.