Optimizing MySQL RDS for Large‑Scale User Activity Logs: Design, Indexing, Partitioning, and Migration Strategies
This article analyzes the performance problems of a massive MySQL 5.6 RDS table storing billions of user‑activity records and presents three practical solutions—optimizing the existing schema and queries, migrating to a MySQL‑compatible high‑performance database, and adopting a big‑data platform—detailing design, indexing, partitioning, sharding, and cloud‑native options.
The author encountered severe query latency on an Alibaba Cloud RDS for MySQL 5.6 instance storing six months of user‑activity logs (≈20 million rows) and a full year of data (≈40 million rows), which caused daily freezes and impacted business.
Three remediation paths are proposed: (1) optimize the current MySQL deployment with minimal cost, (2) upgrade to a 100 % MySQL‑compatible database that requires no code changes, and (3) replace MySQL with a big‑data solution for unlimited scalability.
Option 1 – Optimizing the existing MySQL database
Key recommendations include:
1. Design tables for performance: avoid NULL columns, prefer INT over BIGINT (use UNSIGNED when possible), use ENUM or integer codes instead of strings, store timestamps with TIMESTAMP rather than DATETIME , keep the column count under 20, store IPs as integers, and define columns as NOT NULL whenever feasible.
2. Index wisely: create indexes only on columns used in WHERE or ORDER BY clauses, avoid indexing low‑cardinality fields (e.g., gender), use prefix indexes for long character columns, avoid indexing on columns that are frequently checked for NULL, and prefer composite indexes whose column order matches query predicates.
3. Write efficient SQL: use LIMIT , list required columns instead of * , replace sub‑queries with joins, split large DELETE / INSERT statements, enable slow‑query logging, avoid column‑level arithmetic in WHERE , replace OR with IN (keeping the list under 200 items), minimize use of functions, wildcards, and unnecessary joins, and compare values using the same data type.
4. Partition the table (RANGE, LIST, HASH, KEY) to reduce scan scope; the author tried monthly RANGE partitions (12 partitions) with modest gain, then switched to HASH partitioning on id with 64 partitions, achieving a significant speedup (query time reduced from several seconds to under one second).
5. Shard the table (vertical or horizontal) when partitioning alone is insufficient, though this requires code changes and higher development cost.
6. Consider separate read/write databases (read‑write splitting) before moving to full sharding.
Option 2 – Migrating to a MySQL‑compatible high‑performance database
Open‑source candidates such as TiDB and CUBRID are mentioned, but they may incur higher operational overhead. Cloud alternatives include Alibaba Cloud POLARDB (MySQL‑compatible, up to 100 TB, up to 6× performance), Alibaba OceanBase, and Alibaba HybridDB for MySQL (HTAP). The author tested POLARDB and observed ~10× performance improvement with comparable cost to RDS.
Other cloud options like Tencent Cloud DCDB (auto‑sharding, MySQL‑compatible) are also listed, though the author prefers Alibaba solutions for reliability.
Option 3 – Switching to a big‑data platform
When data reaches the hundred‑billion‑row scale, the author recommends moving to Hadoop‑based ecosystems (HBase/Hive) or cloud‑native services. Alibaba MaxCompute combined with DataWorks is chosen for offline processing, offering SQL, MapReduce, Python, and shell interfaces at low cost (≈300 SQL lines, < ¥100 total).
Overall, the article emphasizes a step‑wise approach: start with schema and query tuning, evaluate compatible high‑performance MySQL services, and finally adopt big‑data solutions if the data volume exceeds the practical limits of relational databases.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.