How to Optimize Ten‑Million‑Row MySQL Tables: Practical Guidelines
Optimizing MySQL tables with tens of millions of rows requires a systematic approach that considers data volume, table type, and performance goals, and involves design standards, business‑layer tactics, architectural strategies, and database‑specific tweaks such as proper indexing, configuration, and management practices to maintain efficiency.
The author tackles the challenge of optimizing MySQL tables that contain tens of millions of rows, emphasizing that simple solutions like sharding or partitioning are only part of a broader strategy.
1. Data Volume
"Ten million" is a perceptual figure; actual data may grow to hundreds of millions, remain relatively stable, or consist largely of obsolete records. Different scenarios demand distinct handling strategies.
2. Object: Data Table
Tables can be classified into three types:
Transactional (流水型) data : stateless, each record independent (e.g., transaction logs).
Stateful (状态型) data : records depend on previous state (e.g., account balances).
Configuration (配置型) data : small, static, rarely changed.
3. Goal: Optimization
Optimization is approached from five perspectives: design standards, business‑layer tactics, architecture, database layer, and management.
Optimization Design 1: Design Standards
Key conventions cover configuration, table creation, naming, indexing, and application practices.
Configuration : Use InnoDB, unify UTF8/UTF8MB4, set transaction isolation to Read‑Committed, limit single‑table rows to ~20 M, keep databases < 50 and tables < 500 per instance.
Table creation : Avoid foreign keys, use DECIMAL for precise numbers, omit display width on integers, avoid ENUM, minimize TEXT / BLOB, use YEAR(4), define columns as NOT NULL, and enforce SQL audit tools.
Naming : Lowercase with underscores, max 12 characters, meaningful without extra comments.
Indexing : Name as idx_col1_col2 or uniq_col1_col2, ≤ 5 columns per index, ≤ 5 indexes per table, always have a primary key, place high‑selectivity columns first, add indexes for WHERE clauses, avoid leading‑% LIKE, use covering indexes, avoid functions on indexed columns, coordinate changes with DBA.
Application : Avoid stored procedures, triggers, and custom functions; prefer UNION ALL over UNION; limit LIMIT usage; use COUNT(*) wisely; retain column comments on modifications; use prepared statements; keep IN lists short; always specify WHERE; match data types; avoid SELECT *; batch inserts.
Optimization Design 2: Business‑Layer Optimization
Focuses on business splitting, data splitting, and read/write patterns.
Business splitting : Separate mixed services into independent domains; isolate state from historical data (e.g., split Account into account and account_hist tables).
Data splitting : Partition by date (daily, monthly, yearly) or by other dimensions; note that MySQL partitioning has scalability limits.
Read‑many/write‑few : Cache hot data with Redis to reduce MySQL load.
Write‑many/read‑few : Use asynchronous commits, queueing, and reduce write frequency (e.g., batch point‑updates from every minute to every ten minutes).
Optimization Design 3: Architecture‑Layer Optimization
Introduce higher‑level techniques:
Horizontal scaling via middleware (MyCAT, ShardingSphere, ProxySQL).
Read‑write splitting with replicas or middleware.
Load balancing (LVS, Consul).
HTAP/NewSQL solutions (TiDB) for combined OLTP + OLAP workloads.
Offline analytics using NoSQL (Infobright, ColumnStore, HBase) or MPP warehouses (Greenplum).
Optimization Design 4: Database‑Layer Optimization
Transaction and SQL refinements:
Transaction simplification : Choose appropriate transaction models; replace stored procedures with plain SQL; convert frequent DDL (adding columns) to dynamic configuration via DML.
Delete optimization : Use partitioned tables or RENAME to archive old data instead of costly DELETE.
SQL simplification : Avoid complex multi‑table joins, anti‑joins/half‑joins, and range scans on large tables.
Index tuning : Ensure a primary key, prefer unique or covering indexes, limit range scans.
Optimization Design 5: Management Optimization
Operational best practices:
Schedule schema changes during low‑traffic windows; use tools like pt‑osc for online alterations.
Make destructive DROP operations reversible by renaming the underlying .ibd file to an archive directory, allowing delayed physical deletion.
Adopt systematic cleanup strategies for hot/cold data separation.
Conclusion : Optimizing ten‑million‑row tables is a multi‑dimensional effort that must align with business scenarios, cost considerations, and technical risk, rather than relying on a single isolated technique.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
