Databases 18 min read

How to Speed Up Massive MySQL Tables: Practical Optimization Strategies

This article examines why a MySQL 5.6 RDS table with tens of millions of rows becomes unbearably slow, then presents three concrete approaches—optimizing the existing database, migrating to a MySQL‑compatible service, and adopting a big‑data engine—detailing design, indexing, partitioning, sharding, and cloud options to restore performance.

Programmer DD
Programmer DD
Programmer DD
How to Speed Up Massive MySQL Tables: Practical Optimization Strategies

Problem Overview

Using Alibaba Cloud RDS for MySQL 5.6, a user‑access log table holds ~20 million rows for six months and ~40 million rows for a year, causing extremely slow queries and daily freezes that severely impact business.

The legacy system was poorly designed; the original developers have left, leaving a maintenance nightmare.

Solution Overview

Option 1: Optimize the existing MySQL database (no code changes, lowest cost, but limited scalability).

Option 2: Upgrade to a 100 % MySQL‑compatible database (minimal code changes, higher cost).

Option 3: Replace MySQL with a big‑data solution (high scalability, but requires code changes).

All three options were tested and practical solutions were produced.

Option 1 Details – Optimizing MySQL

Key points gathered from Alibaba Cloud DB experts and community:

Design tables with performance in mind.

Write optimized SQL.

Use partitioning, sharding, and separate databases when needed.

Table design recommendations

Avoid NULL columns; use default numeric 0.

Prefer INT over BIGINT; use UNSIGNED for non‑negative values; smaller types (TINYINT, SMALLINT, MEDIUMINT) are better.

Replace strings with ENUM or integer codes.

Prefer TIMESTAMP to DATETIME.

Keep column count under 20.

Store IP as integer.

Indexing guidelines

Create indexes only on columns used in WHERE or ORDER BY; verify with EXPLAIN.

Avoid NULL checks in WHERE clauses.

Do not index low‑cardinality columns (e.g., gender).

Use prefix indexes for character columns.

Avoid using character columns as primary keys.

Do not rely on foreign keys; enforce constraints in application code.

Prefer not to use UNIQUE indexes; enforce uniqueness in code.

When using composite indexes, keep column order consistent with query conditions and drop unnecessary single‑column indexes.

In short, choose appropriate data types and indexes.

SQL best practices

Limit result sets with LIMIT.

Avoid SELECT *; list needed columns.

Prefer JOIN over sub‑queries.

Split large DELETE/INSERT statements.

Enable slow‑query log to identify bottlenecks.

Avoid column calculations in WHERE clauses; move expressions to the right side.

Keep each SQL statement simple; one CPU operation per statement.

Replace OR with IN (IN is O(log n)).

Do not use functions or triggers; implement logic in application.

Avoid leading‑wildcard LIKE patterns.

Minimize JOIN usage.

Compare values of the same type.

Use BETWEEN for continuous numeric ranges instead of IN.

Paginate with LIMIT and keep page size reasonable.

Partitioning

MySQL 5.1 introduced horizontal partitioning, transparent to applications. Partitioning splits a logical table into multiple physical sub‑tables; indexes are also per partition, no global index.

Effective partitioning requires queries to include the partition key, otherwise all partitions are scanned. Use EXPLAIN PARTITIONS to verify.

Benefits

Supports larger tables.

Easier maintenance (bulk delete, add partitions).

Potentially faster queries when only a few partitions are accessed.

Data can be spread across multiple devices.

Can avoid certain bottlenecks (e.g., InnoDB index mutex).

Allows backup/restore of individual partitions.

Limitations

Maximum 1024 partitions per table.

Primary key or UNIQUE columns must be part of the partition key.

No foreign key support on partitioned tables.

NULL values break partition pruning.

All partitions must use the same storage engine.

Partition types

RANGE – based on a continuous interval.

LIST – based on discrete values.

HASH – based on a user‑defined expression.

KEY – similar to HASH but uses MySQL’s internal hash function on integer columns.

In practice, the author first tried RANGE partitioning by month (12 partitions) with ~6× speedup, then switched to HASH partitioning on the id column (64 partitions), achieving a significant performance gain and solving the problem.

Result: PARTITION BY HASH (id) PARTITIONS 64 select count(*) from readroom_website; -- 11,901,336 rows Duration: 5.734 sec. select * from readroom_website where month(accesstime)=11 limit 10; Duration: 0.719 sec.

Sharding (Table Splitting)

If optimization and partitioning still cannot meet performance, split the large table into multiple tables (vertical or horizontal). Example: split by id into 100 tables named tableName_id%100. This requires code changes and high development cost, thus not recommended for already‑deployed systems.

Database Sharding (Multiple Databases)

Separating read and write traffic across multiple databases can help, but full sharding introduces significant development overhead and is generally not advised.

Option 2 Details – Switching to a Compatible Database

When MySQL performance is insufficient, migrate to a 100 % MySQL‑compatible database to keep the application unchanged.

Open‑source candidates: TiDB, CUBRID.

Cloud options: Alibaba Cloud POLARDB (up to 100 TB, up to 6× MySQL performance, 1/10 cost of commercial DB), OceanBase, HybridDB for MySQL (HTAP, high cost), Tencent Cloud DCDB (horizontal sharding, lower price).

POLARDB was tested and showed ~10× performance improvement at comparable cost to RDS.

Option 3 Details – Moving to a Big‑Data Engine

For data volumes exceeding hundreds of millions, consider big‑data solutions.

Open‑source: Hadoop ecosystem (HBase, Hive) – high operational cost.

Cloud: Alibaba MaxCompute + DataWorks (pay‑as‑you‑go, low cost). MaxCompute offers SQL, MapReduce, AI, Python, and shell scripts, with a graphical workflow manager. Approximately 300 lines of SQL solved the problem for under ¥100.

Other cloud big‑data services (e.g., HBase) are also available.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingperformance tuningmysqlDatabase OptimizationPartitioning
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.