Eight Database Optimization Strategies for Backend Engineers
This article outlines eight comprehensive database optimization techniques—including reducing data volume, using space‑for‑performance methods, and selecting appropriate storage systems—explaining their underlying principles, trade‑offs, and practical implementation steps for backend engineers facing slow queries and high load.
Why Databases Are Slow?
Performance problems in both relational and NoSQL databases are mainly caused by three factors: the time complexity of search algorithms, the underlying storage data structure, the total amount of data, and high load that leads to CPU and disk contention.
Root Cause
Corresponding Factor
Search time complexity
Search algorithm
Storage data structure
Storage data structure
Total data volume
Data sharding
High load
CPU, disk busy
For relational databases the index structure is usually a B+Tree with O(log n) complexity, so most optimizations focus on reducing data volume. High load often requires clustering or data redundancy solutions.
Which Layer to Consider for Optimization?
The system can be viewed as four layers from top to bottom: hardware, storage system, storage structure, and concrete implementation. Optimizations should start at the concrete implementation layer (e.g., adding indexes) because it has the lowest cost and highest impact. If no improvement is possible, move up to the storage structure layer (e.g., table design, sharding). Higher layers such as storage system and hardware involve higher cost and lower cost‑effectiveness.
Eight Solutions Summary
The core ideas are threefold: reduce data volume, use space for performance, and choose the appropriate storage system. Each idea maps to the three root causes identified earlier.
1. Reduce Data Volume
Data Serialization Storage : Store rarely queried data in a serialized blob to avoid row‑level indexing.
Data Archiving : Periodically move cold data to historical tables or separate databases; use OPTIMIZE TABLE for MySQL when space permits.
Intermediate (Result) Tables : Pre‑compute heavy queries into physical tables via scheduled jobs, dramatically compressing data for reporting.
Sharding (Database/Table Partitioning) : Split large tables or databases horizontally or vertically based on business or technical criteria. Horizontal sharding distributes rows across identical tables; vertical sharding separates columns to reduce row size.
2. Use Space for Performance
Distributed Cache : Deploy client‑side, service‑side, or distributed caches (e.g., Redis, Memcached). The most common pattern is Cache‑Aside. Key considerations: avoid over‑caching, prevent cache penetration and cache breakdown by caching empty results with short TTLs.
Master‑Slave Replication (One‑Master Multiple‑Slaves) : Use read‑only replicas to offload read traffic. Replication adds hardware cost but provides immediate read scalability.
3. Choose the Appropriate Storage System
NoSQL Types : Key‑value, document, column, graph, and search‑engine stores each have distinct search algorithms and data structures. Examples: Elasticsearch uses inverted indexes for fast full‑text search; Redis hash structures give O(1) lookups.
CQRS (Command‑Query Responsibility Segregation) : Write to a relational database for ACID guarantees, read from a high‑performance store (e.g., Redis, Elasticsearch). Data synchronization can be push‑based (CDC, domain events) or pull‑based (periodic polling).
Replace (Select) Storage : Gradually migrate from a relational store to a suitable NoSQL store, using a feature flag and a synchronization layer to ensure data consistency during the transition.
Each solution has short‑term and long‑term trade‑offs. Short‑term fixes are cheap and fast but may introduce technical debt; long‑term solutions require higher upfront effort but provide sustainable performance and scalability.
In practice, most performance issues can be addressed by first reducing data volume (e.g., archiving, sharding), then adding caching or replication, and finally, if necessary, adopting a different storage system with proper synchronization.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.