Eight Major Database Optimization Strategies for Backend Engineers
This article outlines eight comprehensive database optimization approaches—reducing data volume, using space for performance, selecting appropriate storage systems, and related techniques such as data archiving, intermediate tables, serialization, sharding, distributed caching, read‑write splitting, and CQRS—to help backend engineers systematically address common performance bottlenecks.
Hello everyone, I am Chen.
Backend engineers across companies often encounter database performance problems as their first major headache. A systematic methodology can help quickly and accurately choose suitable optimization solutions for 80‑90% of performance issues.
Why is the database slow?
Nature of Slowness
Search time complexity
Search algorithm
Storage data structure
Storage data structure
Data volume
Data partitioning
High load
CPU, disk busy
Regardless of relational or NoSQL databases, three main factors affect query performance: search time complexity, total data volume, and high load.
Search time complexity
Data volume
High load
The search time complexity is determined by two aspects: the search algorithm and the storage data structure.
Search algorithm
Storage data structure
For relational databases, the index structure is typically B+Tree with O(log n) complexity and row‑store storage, so optimization usually focuses on reducing data volume.
High load stems from high concurrency, complex queries, and insufficient server resources, often mitigated by clustering or data redundancy.
At which layer should we think about optimization?
The stack consists of four layers from top to bottom: hardware, storage system, storage structure, and concrete implementation. Optimizing lower layers (e.g., concrete implementation) is cheaper and yields immediate gains, while higher layers involve higher cost and lower cost‑effectiveness.
Therefore, prioritize concrete implementation (e.g., adding indexes). If no room remains, move up to storage structure (e.g., sharding, compression) and then to storage system (e.g., switching to NoSQL).
This article focuses on the middle two layers: storage structure and storage system.
Eight Solution Summary
The core of database optimization can be categorized into three principles: reduce data volume , use space to improve performance , and choose the right storage system . These correspond to the three root causes: data volume, high load, and search time complexity.
Solutions can bring short‑term benefits with low implementation cost but may incur technical debt, or long‑term benefits with higher upfront cost but better scalability.
Static data refers to low‑frequency changes and simple queries; dynamic data changes frequently and requires more complex filtering.
Reduce Data Volume
Four typical approaches: data serialization storage, data archiving, intermediate/result tables, and sharding (database‑and‑table partitioning).
Data Archiving
Move infrequently accessed data to historical tables; use OPTIMIZE TABLE in MySQL if space needs to be reclaimed, noting that it locks the table.
Intermediate (Result) Tables
Run batch jobs to materialize complex query results into a physical table, dramatically compressing data for reporting purposes.
Data Serialization Storage
Store non‑relational data as serialized blobs (e.g., JSON) to keep the row count low; suitable for data that does not require field‑level queries.
Sharding (Database & Table Partitioning)
Classic but costly approach; recommended only when other methods fail. Prefer NoSQL alternatives when possible.
Treat sharding as a last resort.
Prefer NoSQL for scalability and performance.
Use table sharding for large data volume, database sharding for high concurrency.
Plan for future scaling; avoid over‑engineering.
Splitting Methods
Two main split types: vertical (business‑oriented) and horizontal (technical‑oriented).
Vertical splitting reduces business coupling and can lower row size, improving I/O.
Horizontal splitting creates identical tables distributed by a sharding key, solving data‑volume issues.
Routing Methods
After horizontal splitting, queries must use the sharding key to locate the correct physical table. Common routing strategies are range, hash, and shard‑mapping tables.
Range : split by intervals (e.g., month); easy to manage but may cause data skew.
Hash : uniform distribution via modulo; fast but queries without the sharding key require full scans.
Shard‑mapping table : an auxiliary table maps alternative query fields to the sharding key, enabling indirect lookups.
Use Space to Improve Performance
Two typical solutions for high‑load scenarios: distributed caching and read‑write splitting (one‑master‑many‑slaves).
Distributed Caching
Caching layers include client cache, API service local cache, and distributed cache. The latter typically uses NoSQL key‑value stores such as Memcached or Redis.
Common cache strategies: Cache‑Aside, Read/Write‑Through, Write‑Back. Cache‑Aside is most widely used.
Key cautions: avoid overusing cache, beware of cache penetration (queries for non‑existent data) and cache breakdown (high concurrency on cold keys). Cache empty results with short TTL to mitigate breakdown.
One‑Master‑Many‑Slaves (Read‑Write Splitting)
Deploy multiple read‑only replicas to offload read traffic. Routing can be handled by code or middleware. This is an effective short‑term solution but incurs higher hardware cost due to full data duplication.
Choose the Appropriate Storage System
NoSQL types include key‑value, document, column, graph, and search engine. Different storage systems dictate search algorithms and data structures, addressing performance, concurrency, and scalability challenges of relational databases.
Examples: Elasticsearch uses inverted indexes for fast full‑text search; Redis hash provides O(1) lookups with in‑memory storage.
Two main patterns: CQRS and storage replacement.
CQRS
CQS (Command‑Query Separation) states that a method should either modify state or query it, not both. CQRS extends this by using a relational database for writes and a separate system (e.g., NoSQL) for reads, combining ACID guarantees with high‑performance queries.
Implementation difficulty lies in data synchronization between the two stores.
Data Synchronization Methods
Synchronization can be push‑based (CDC or domain events) or pull‑based (periodic polling). CDC captures changes at the database level; domain events are emitted by the application.
Replace (Select) Storage System
Understanding NoSQL strengths and weaknesses enables informed decisions when swapping storage. A migration strategy should include an intermediate version with data sync and feature toggles to ensure consistency before fully switching.
Conclusion
The eight solutions each fit specific scenarios; there is no one‑size‑fits‑all. Most involve data synchronization, which introduces consistency considerations, making them best suited for read‑heavy workloads or carefully managed write‑read patterns.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.