How to Speed Up Massive MySQL User‑Log Tables: Partitioning, Indexing, and Migration Strategies
This article examines performance problems with a 20‑million‑row MySQL user‑log table on Alibaba Cloud RDS, outlines three solution paths—optimizing the existing database, migrating to a MySQL‑compatible high‑performance service, and adopting a big‑data engine—and provides detailed guidance on schema design, indexing, partitioning, and practical SQL tweaks.
Problem Overview
Alibaba Cloud RDS for MySQL 5.6 stores a user‑access log table with ~20 million rows for six months and ~40 million rows for a year. Queries become extremely slow and cause daily system hangs. The original schema and SQL are poorly designed.
Solution Options
Optimize the existing MySQL database (no code changes, low cost, limited scalability).
Migrate to a MySQL‑compatible high‑performance service (minimal code changes, higher cost).
Adopt a big‑data platform (high scalability, requires code changes).
Approach 1 – Optimizing the Existing MySQL Database
Table Design Recommendations
Avoid NULL columns; use default numeric values.
Prefer INT over BIGINT; use UNSIGNED for non‑negative values.
Replace string columns with ENUM or integer codes.
Prefer TIMESTAMP to DATETIME.
Keep column count below 20.
Store IP addresses as integers.
Indexing Guidelines
Create indexes only on columns used in WHERE or ORDER BY clauses; verify with EXPLAIN.
Do not index columns that are frequently NULL.
Avoid indexing low‑cardinality columns (e.g., gender).
Use prefix indexes for long VARCHAR columns.
Avoid primary keys on large VARCHAR columns.
Enforce foreign‑key logic in application code.
Minimize UNIQUE constraints unless required.
When using composite indexes, match the column order to query predicates.
SQL Optimization Tips
Limit result sets with LIMIT.
Avoid SELECT *; list needed columns.
Prefer JOIN over sub‑queries.
Break large DELETE / INSERT statements into smaller batches.
Enable slow‑query logging to identify bottlenecks.
Move column calculations to the right side of predicates.
Keep each statement simple to reduce lock time.
Replace OR with IN (logarithmic vs linear cost).
Handle complex logic in application code, not in triggers or functions.
Avoid leading wildcards in LIKE patterns.
Minimize the number of joins.
Compare values of the same type.
Avoid != or <> in WHERE clauses.
Prefer BETWEEN for continuous ranges.
Paginate large result sets with reasonable page sizes.
Partitioning
MySQL 5.1+ supports horizontal partitioning. Initial RANGE partitioning by month (12 partitions) gave ~6× speedup. Switching to HASH partitioning on id with 64 partitions yielded dramatic performance gains. PARTITION BY HASH (id) PARTITIONS 64; Example query after partitioning:
SELECT * FROM readroom_website WHERE MONTH(accesstime)=11 LIMIT 10;Execution time dropped from several seconds to under one second.
Partition Limits
Maximum 1024 partitions per table.
Primary‑key or unique‑key columns must be part of the partition key.
Partitioned tables cannot have foreign keys.
NULL values prevent partition pruning.
All partitions must use the same storage engine.
Supported types: RANGE, LIST, HASH, KEY.
Sharding and Database Splitting
Horizontal sharding splits a large table into many smaller tables (e.g., tableName_id%100) and requires code changes. Vertical sharding separates columns into different tables, also requiring development effort. Database‑level read/write separation adds operational complexity and is not recommended for this case.
Approach 2 – Migrating to a MySQL‑Compatible High‑Performance Service
Open‑source options: TiDB ( https://github.com/pingcap/tidb) and CUBRID. Cloud services evaluated:
Alibaba Cloud POLARDB – 100% MySQL compatible, up to 100 TB storage, up to 6× MySQL performance, cost‑effective.
Alibaba Cloud OceanBase – MySQL‑compatible HTAP engine, higher cost, suited for mixed OLTP/OLAP workloads.
Tencent Cloud DCDB – MySQL‑compatible distributed database with automatic sharding.
Testing POLARDB showed ~10× performance improvement with minimal migration effort.
Approach 3 – Switching to a Big‑Data Engine
When data exceeds hundreds of millions of rows, consider:
Open‑source Hadoop ecosystem (HBase, Hive) – high operational cost.
Alibaba Cloud MaxCompute + DataWorks – serverless, pay‑as‑you‑go, suitable for batch processing. Implemented ~300 SQL lines and solved the problem for under ¥100.
MaxCompute provides SQL, MapReduce, Python, and shell interfaces; DataWorks offers workflow orchestration.
Conclusion
For workloads below the hundred‑million‑row threshold, start with MySQL schema and query optimization, then evaluate POLARDB if further performance is needed. Only migrate to a big‑data solution when relational databases can no longer handle the data volume.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
