Optimizing Large MySQL Tables on Alibaba Cloud: Design, Indexing, PolarDB Migration, X‑Engine Partitioning, and Parallel Query
This article details how to improve performance of massive MySQL tables on Alibaba Cloud by redesigning schemas and indexes, migrating to PolarDB with read‑write separation, partitioning historical data using the X‑Engine storage engine, and leveraging PolarDB's parallel query capabilities for faster analytics.
Background
Alibaba Cloud RDS for MySQL (version 5.7) stores over ten million new rows per month; as data grows, large tables cause slow queries that can take dozens of seconds during peak periods, severely impacting business.
Solution Overview
1. Database Design and Index Optimization
MySQL’s flexibility places performance heavily on developers' schema and index design. The following recommendations are suggested:
Store time values as integer timestamps to improve index efficiency.
Define columns as NOT NULL; NULL values hinder optimization and waste index space.
Replace ENUM with TINYINT.
Use DECIMAL instead of FLOAT/DOUBLE for precise numbers.
Set column lengths according to actual business needs; avoid oversized definitions.
Avoid TEXT type; if necessary, split large rarely‑used fields into separate tables.
InnoDB limits index column length to 767 bytes (3072 bytes total); MySQL 8.0 allows up to 1024 characters per index.
For DDL operations on large tables, involve a DBA.
Left‑most Index Matching Rule
The leftmost column of a composite index is prioritized. When creating a composite index, place the most frequently used column in the WHERE clause at the leftmost position. For example, if columns c1 and c2 appear in WHERE, the order (c1,c2) is preferred when c1 has fewer duplicate values.
Composite index index(a,b,c)
WHERE a=3 uses only a
WHERE a=3 AND b=5 uses a,b
WHERE a=3 AND b=5 AND c=4 uses a,b,c
WHERE b=3 OR WHERE c=4 does not use the index
WHERE a=3 AND c=4 uses only a
WHERE a=3 AND b>10 AND c=7 uses a,b
WHERE a=3 AND b LIKE 'xx%' AND c=7 uses a,b
Effectively creates multiple indexes: key(a), key(a,b), key(a,b,c)
2. Switch to PolarDB Read‑Write Separation
PolarDB is Alibaba Cloud’s next‑generation relational cloud database, 100% MySQL‑compatible, supporting up to 100 TB storage and up to 16 nodes per instance. It separates storage and compute, allowing all compute nodes to share a single data copy, providing minute‑level scaling, second‑level failover, global consistency, and free backup.
Cluster architecture with one Writer node and multiple Reader nodes sharing storage via PolarFileSystem.
Read‑write separation through an internal Proxy that routes writes to the Writer and distributes reads across Readers, appearing as a single endpoint to applications.
In mixed offline scenarios, different business services can use distinct connection addresses and data nodes to avoid interference.
Sysbench performance test results:
PolarDB 4‑core 16 GB (2 nodes)
PolarDB 8‑core 32 GB (2 nodes)
3. Partition Historical Data to MySQL 8.0 X‑Engine
Active tables retain three months of data; older data is partitioned monthly into historical databases using the X‑Engine storage engine. X‑Engine offers cost savings (about half of InnoDB) and a layered storage architecture that separates hot and cold data, compressing cold layers automatically.
Cost reduction: X‑Engine storage cost is roughly 50% of InnoDB.
Layered storage improves QPS by placing hot data in memory and cold data in compressed layers.
X‑Engine is a self‑developed OLTP storage engine that seamlessly integrates with MySQL via the plugin architecture. It uses an LSM‑Tree foundation, memory‑resident hot data, pipeline transaction processing, and multi‑tier storage (NVM/SSD/HDD). Optimizations include efficient compaction, fine‑grained LSM shape control, and enhanced read caching.
LSM‑Tree based layered storage with redesigned architecture.
Hot data stored in memory using lock‑free index structures.
Pipeline transaction processing for high throughput.
Cold data gradually merged into persistent layers using appropriate devices.
Compaction process heavily optimized to reduce I/O and CPU cost.
Fine‑grained LSM shape control to limit space growth.
Improved read performance via refined caching and access control.
4. PolarDB MySQL 8.0 Parallel Query
After partitioning, the data volume remains large, and some slow queries persist. PolarDB’s parallel query framework automatically activates when query data exceeds a threshold, distributing data shards across multiple threads, aggregating results, and dramatically reducing latency.
Parallel query leverages multi‑core CPUs; for example, an 8‑core 32 GB configuration shows the architecture below:
Parallel query is suitable for most SELECT statements, such as large‑table scans, multi‑table joins, and compute‑intensive queries. Short queries see limited benefit.
Usage: Hint syntax can enable parallelism for specific statements when the global setting is disabled.
SELECT /*+PARALLEL(x)*/ … FROM …; -- x > 0
SELECT /*+ SET_VAR(max_parallel_degree=n) */ * FROM …; -- n > 0
Test: 16‑core 32 GB instance, single table >30 M rows.
Without parallelism: 4326 ms; with parallelism: 525 ms – an 8.24× speedup.
5. Interactive Analytics with Hologre
Even with parallel query, real‑time dashboards and reports may still require big‑data processing. Alibaba Cloud’s interactive analytics service Hologre is recommended for such scenarios.
6. Afterword
Optimizing tables with tens of millions of rows is a cost‑driven effort based on business scenarios; it is not always necessary to horizontally shard the database from the start, as that can introduce operational complexity. Proper schema design, indexing, and partitioning should be evaluated before adopting more advanced technologies.
Fulu Network R&D Team
Providing technical literature sharing for Fulu Holdings' tech elite, promoting its technologies through experience summaries, technology consolidation, and innovation sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.