Databases 10 min read

Query Separation: Optimizing Large Table Reads by Decoupling Query Data

The article explains query separation as a technique to improve performance of massive business tables by storing write‑optimized data and query‑optimized data separately, discusses when to apply it, various implementation methods, storage options, and operational considerations such as MQ handling and consistency.

Code Ape Tech Column

Sep 26, 2023

Query Separation: Optimizing Large Table Reads by Decoupling Query Data

What Is Query Separation?

Query separation means saving a backup of the data to another storage system when writing, and reading directly from that storage for queries, as illustrated in the diagram.

The simple architecture hides details that need deeper analysis, such as when to trigger query separation, how to implement it, storage system selection, and usage.

Applicable Scenarios for Query Separation

Consider using query separation when:

Data volume is large.

Write request performance is acceptable.

Query request performance is poor.

Data may be modified at any time.

The business wants to optimize query functionality.

Example: a SaaS customer‑service system with tens of millions of tickets and dozens of related tables, where traditional indexing and SQL tuning still resulted in seconds‑long query times.

By separating write‑side and query‑side databases, update speed improved dramatically and query latency dropped to under 500 ms.

When To Trigger Query Separation?

Triggering occurs when a copy of the data should be saved to a query database, i.e., during data heterogeneity. Three common approaches are:

Synchronous creation.

Asynchronous creation.

Binlog‑based creation.

1. Synchronous Creation

Modify business code to create the query record immediately after writing the primary data.

Advantages: Guarantees consistency and real‑time availability of query data.

Disadvantages: Strong code intrusion and slower write performance.

2. Asynchronous Creation

After writing data, an asynchronous process creates the query record.

Advantages: Does not affect the main workflow.

Disadvantages: Potential data consistency issues.

3. Binlog‑Based Creation

This widely used, non‑intrusive method listens to database binlog events to build query data.

Advantages: No impact on the main process; zero code intrusion.

Disadvantages: Consistency challenges and more complex architecture.

How To Implement Query Separation?

The three methods above are common; the synchronous method is straightforward and omitted here. The binlog method is described in a previous article on data heterogeneity.

For asynchronous implementation, in‑memory buffering can be used but has drawbacks such as memory limits and data loss on service restart. Therefore, using a message queue (MQ) is recommended.

MQ considerations include:

If the company already uses an MQ, continue with it.

If not, evaluate options (see the linked article on MQ selection).

MQ Failure Handling

If MQ crashes, query data creation stops. Adding a flag (migrated / not migrated) to each record allows the system to resume creation after MQ recovery.

Idempotent Consumption

Ensure messages are processed idempotently to avoid duplicate query records.

Message Ordering

Out‑of‑order processing can cause stale query data; proper sequencing logic is required.

Choosing a Storage System For Query Data

Relational databases are unsuitable for massive query workloads. Options include:

MongoDB

HBase

Elasticsearch

Selection depends on existing infrastructure and team expertise; the author chose Elasticsearch for its query extensibility and familiarity.

Using Query Data

Each database provides an API for querying. Two strategies address the inconsistency window before query data is updated:

Block user queries until the latest data is available (rarely used).

Show a notice that data may be up to one second old and suggest a refresh.

Summary

This article presents query separation as a solution for slow queries on large tables. While effective, it is not a silver bullet; challenges such as write performance and historical data migration remain for future discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Asynchronous Replication query separation cold hot data

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.