Databases 13 min read

How to Supercharge Large Database Tables: Proven Optimization Techniques

This article explains why massive tables become slow, identifies common bottlenecks such as disk I/O, missing indexes, deep pagination and lock contention, and provides a step‑by‑step guide covering table design, indexing, SQL tuning, sharding, caching and a real‑world case study to dramatically improve performance.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Supercharge Large Database Tables: Proven Optimization Techniques

Preface

Large‑table optimization is a recurring topic; as business data grows, many encounter slow queries, write stalls, pagination lag, or even crashes.

1 Why Large Tables Are Slow

1.1 Disk I/O Bottleneck

When a table holds tens of millions of rows, queries must read many disk blocks, and the disk’s read/write speed becomes the limiting factor.

Example

Assume an orders table with 50 million rows and you want the latest 10 orders of a user:

SELECT * FROM orders WHERE user_id = 123 ORDER BY order_time DESC LIMIT 10;

Without an index the database scans the whole table and then sorts, which is extremely slow.

1.2 Index Missing or Invalid

If a query does not hit an index, the database performs a full table scan, reading every row—costly for tens of millions of records.

Example

Using a function on an indexed column makes the index ineffective:

SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';

Rewrite the condition to use the raw column so the index can be used:

SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00' AND order_time < '2023-01-02 00:00:00';

1.3 Pagination Performance Degradation

Deep pagination (e.g., page 1000) forces the database to scan and discard many rows before returning the desired page.

Example

SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;

The query first fetches the first 9 990 rows, discards them, then returns the next 10, causing performance to worsen as the page number grows.

1.4 Lock Contention

In high‑concurrency scenarios, multiple threads updating the same table can cause row or table lock contention, further degrading performance.

2 Overall Optimization Strategy

The essence of performance tuning is to reduce unnecessary I/O, computation, and lock competition . The main steps are:

Reasonable table design : avoid unnecessary columns and split data when possible.

Effective indexes : design proper index structures and avoid index loss.

SQL optimization : make query conditions precise to minimize full scans.

Sharding (horizontal/vertical) : split large tables to reduce per‑table data volume.

Caching and async processing : reduce direct database pressure.

3 Table Structure Optimization

3.1 Simplify Field Types

Field type determines storage size and query speed.

Prefer INT over BIGINT when the range fits.

Prefer VARCHAR(100) over TEXT for short strings.

Use TIMESTAMP or DATETIME for time fields instead of CHAR / VARCHAR.

Example

-- Not recommended
CREATE TABLE orders (
  id BIGINT,
  user_id BIGINT,
  order_status VARCHAR(255),
  remarks TEXT
);

-- Optimized
CREATE TABLE orders (
  id BIGINT PRIMARY KEY,
  user_id INT UNSIGNED,
  order_status TINYINT, -- use enum
  remarks VARCHAR(500) -- limit length
);

3.2 Table Splitting

Vertical Splitting

Separate rarely used columns into another table.

Example: split orders into orders_basic and orders_details:

-- Basic table
CREATE TABLE orders_basic (
  id BIGINT PRIMARY KEY,
  user_id INT UNSIGNED,
  order_time TIMESTAMP
);

-- Details table
CREATE TABLE orders_details (
  id BIGINT PRIMARY KEY,
  remarks VARCHAR(500),
  shipping_address VARCHAR(255)
);

Horizontal Splitting

Distribute rows across multiple tables based on a rule, e.g., user_id modulo:

orders_0  -- stores rows where user_id % 2 = 0
orders_1  -- stores rows where user_id % 2 = 1

4 Index Optimization

4.1 Create Appropriate Indexes

Build indexes on high‑frequency query columns, such as composite indexes on user_id and order_time:

CREATE INDEX idx_user_id_order_time ON orders (user_id, order_time DESC);

4.2 Avoid Index Loss

Don’t apply functions or calculations on indexed columns .

-- Bad
SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';

-- Good
SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00' AND order_time < '2023-01-02 00:00:00';

Beware of implicit type conversion .

-- Bad
SELECT * FROM orders WHERE user_id = '123';

-- Good
SELECT * FROM orders WHERE user_id = 123;

5 SQL Optimization

5.1 Reduce Queried Fields

Avoid SELECT *; specify only needed columns.

-- Bad
SELECT * FROM orders WHERE user_id = 123;

-- Good
SELECT id, order_time FROM orders WHERE user_id = 123;

5.2 Pagination Optimization

For deep pagination, use a “cursor” approach instead of large offsets.

-- Bad (deep pagination)
SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;

-- Good (cursor)
SELECT * FROM orders WHERE order_time < '2023-01-01 12:00:00' ORDER BY order_time DESC LIMIT 10;

6 Sharding

6.1 Horizontal Sharding

If a single table still cannot meet performance needs, distribute data across multiple databases.

Common Rules

Modulo by user ID.

Time‑based partitioning.

7 Caching and Asynchronous Processing

7.1 Use Redis to Cache Hot Data

// Read from cache
String result = redis.get("orders:user:123");
if (result == null) {
    result = database.query("SELECT * FROM orders WHERE user_id = 123");
    redis.set("orders:user:123", result, 3600); // cache for 1 hour
}

7.2 Use Message Queue for Async Writes

In high‑concurrency write scenarios, push write operations to a queue (e.g., Kafka) and batch insert asynchronously.

8 Practical Case Study

Problem

An e‑commerce order table with 50 million rows causes page loads over 10 seconds when users view order details.

Solution

Vertical split the order table to separate detail fields.

Create a composite index on user_id and order_time.

Cache recent 30‑day orders in Redis.

Replace deep pagination LIMIT with a search_after cursor.

Conclusion

Optimizing large tables is a systematic effort that must consider table design, indexes, SQL, and overall architecture. Even with tens of millions of rows, proper splitting, index design, and caching can keep the database responsive.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingcachingLarge TablesIndex TuningSQL Performance
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.