How to Supercharge Large Database Tables: Proven Optimization Techniques
This article explains why massive tables become slow, identifies common bottlenecks such as disk I/O, missing indexes, deep pagination and lock contention, and provides a step‑by‑step guide covering table design, indexing, SQL tuning, sharding, caching and a real‑world case study to dramatically improve performance.
Preface
Large‑table optimization is a recurring topic; as business data grows, many encounter slow queries, write stalls, pagination lag, or even crashes.
1 Why Large Tables Are Slow
1.1 Disk I/O Bottleneck
When a table holds tens of millions of rows, queries must read many disk blocks, and the disk’s read/write speed becomes the limiting factor.
Example
Assume an orders table with 50 million rows and you want the latest 10 orders of a user:
SELECT * FROM orders WHERE user_id = 123 ORDER BY order_time DESC LIMIT 10;Without an index the database scans the whole table and then sorts, which is extremely slow.
1.2 Index Missing or Invalid
If a query does not hit an index, the database performs a full table scan, reading every row—costly for tens of millions of records.
Example
Using a function on an indexed column makes the index ineffective:
SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';Rewrite the condition to use the raw column so the index can be used:
SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00' AND order_time < '2023-01-02 00:00:00';1.3 Pagination Performance Degradation
Deep pagination (e.g., page 1000) forces the database to scan and discard many rows before returning the desired page.
Example
SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;The query first fetches the first 9 990 rows, discards them, then returns the next 10, causing performance to worsen as the page number grows.
1.4 Lock Contention
In high‑concurrency scenarios, multiple threads updating the same table can cause row or table lock contention, further degrading performance.
2 Overall Optimization Strategy
The essence of performance tuning is to reduce unnecessary I/O, computation, and lock competition . The main steps are:
Reasonable table design : avoid unnecessary columns and split data when possible.
Effective indexes : design proper index structures and avoid index loss.
SQL optimization : make query conditions precise to minimize full scans.
Sharding (horizontal/vertical) : split large tables to reduce per‑table data volume.
Caching and async processing : reduce direct database pressure.
3 Table Structure Optimization
3.1 Simplify Field Types
Field type determines storage size and query speed.
Prefer INT over BIGINT when the range fits.
Prefer VARCHAR(100) over TEXT for short strings.
Use TIMESTAMP or DATETIME for time fields instead of CHAR / VARCHAR.
Example
-- Not recommended
CREATE TABLE orders (
id BIGINT,
user_id BIGINT,
order_status VARCHAR(255),
remarks TEXT
);
-- Optimized
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
user_id INT UNSIGNED,
order_status TINYINT, -- use enum
remarks VARCHAR(500) -- limit length
);3.2 Table Splitting
Vertical Splitting
Separate rarely used columns into another table.
Example: split orders into orders_basic and orders_details:
-- Basic table
CREATE TABLE orders_basic (
id BIGINT PRIMARY KEY,
user_id INT UNSIGNED,
order_time TIMESTAMP
);
-- Details table
CREATE TABLE orders_details (
id BIGINT PRIMARY KEY,
remarks VARCHAR(500),
shipping_address VARCHAR(255)
);Horizontal Splitting
Distribute rows across multiple tables based on a rule, e.g., user_id modulo:
orders_0 -- stores rows where user_id % 2 = 0
orders_1 -- stores rows where user_id % 2 = 14 Index Optimization
4.1 Create Appropriate Indexes
Build indexes on high‑frequency query columns, such as composite indexes on user_id and order_time:
CREATE INDEX idx_user_id_order_time ON orders (user_id, order_time DESC);4.2 Avoid Index Loss
Don’t apply functions or calculations on indexed columns .
-- Bad
SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';
-- Good
SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00' AND order_time < '2023-01-02 00:00:00';Beware of implicit type conversion .
-- Bad
SELECT * FROM orders WHERE user_id = '123';
-- Good
SELECT * FROM orders WHERE user_id = 123;5 SQL Optimization
5.1 Reduce Queried Fields
Avoid SELECT *; specify only needed columns.
-- Bad
SELECT * FROM orders WHERE user_id = 123;
-- Good
SELECT id, order_time FROM orders WHERE user_id = 123;5.2 Pagination Optimization
For deep pagination, use a “cursor” approach instead of large offsets.
-- Bad (deep pagination)
SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;
-- Good (cursor)
SELECT * FROM orders WHERE order_time < '2023-01-01 12:00:00' ORDER BY order_time DESC LIMIT 10;6 Sharding
6.1 Horizontal Sharding
If a single table still cannot meet performance needs, distribute data across multiple databases.
Common Rules
Modulo by user ID.
Time‑based partitioning.
7 Caching and Asynchronous Processing
7.1 Use Redis to Cache Hot Data
// Read from cache
String result = redis.get("orders:user:123");
if (result == null) {
result = database.query("SELECT * FROM orders WHERE user_id = 123");
redis.set("orders:user:123", result, 3600); // cache for 1 hour
}7.2 Use Message Queue for Async Writes
In high‑concurrency write scenarios, push write operations to a queue (e.g., Kafka) and batch insert asynchronously.
8 Practical Case Study
Problem
An e‑commerce order table with 50 million rows causes page loads over 10 seconds when users view order details.
Solution
Vertical split the order table to separate detail fields.
Create a composite index on user_id and order_time.
Cache recent 30‑day orders in Redis.
Replace deep pagination LIMIT with a search_after cursor.
Conclusion
Optimizing large tables is a systematic effort that must consider table design, indexes, SQL, and overall architecture. Even with tens of millions of rows, proper splitting, index design, and caching can keep the database responsive.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
