Databases 24 min read

Master MySQL Performance: Practical Optimizations for Large Tables

This guide explains why MySQL performance drops when a single table grows huge and provides a comprehensive set of optimization steps—including field type choices, index design, query tuning, engine selection, system parameters, hardware scaling, read/write splitting, caching, partitioning, vertical and horizontal sharding, and architecture recommendations—to keep MySQL fast and reliable at scale.

dbaplus Community

Sep 24, 2018

Master MySQL Performance: Practical Optimizations for Large Tables

1. Single Table Optimization

When a MySQL table contains millions of rows, CRUD performance degrades sharply. Optimize the table structure before considering sharding.

Prefer TINYINT, SMALLINT, MEDIUMINT (add UNSIGNED if non‑negative) over INT.

Allocate only the necessary length for VARCHAR columns.

Replace string columns with ENUM or integer codes.

Use TIMESTAMP instead of DATETIME when possible.

Keep the total number of columns under 20.

Avoid NULL columns; they hinder index usage and waste space.

Store IP addresses as integers.

2. Index Optimization

Create indexes only on columns used in WHERE or ORDER BY; verify usage with EXPLAIN.

Never test for NULL in the WHERE clause; it forces a full table scan.

Skip indexes on low‑cardinality columns (e.g., gender with only a few distinct values).

Use prefix indexes for long character columns.

Avoid making character columns primary keys.

Prefer application‑enforced constraints over foreign keys and UNIQUE indexes.

When using multi‑column indexes, keep column order consistent with query predicates and drop unnecessary single‑column indexes.

3. Query Tuning

Enable the slow‑query log to locate expensive statements.

Avoid column operations (e.g., SELECT id WHERE age+1=10) because they prevent index usage.

Keep statements simple; one CPU core per statement, split large statements into smaller ones.

Never use SELECT *; specify needed columns.

Rewrite OR as IN (prefer IN for up to ~200 items).

Implement functions and triggers in application code, not in the database.

Avoid leading wildcards ( %xxx) in LIKE patterns.

Minimize JOIN usage.

Compare values of the same type (e.g., numeric vs numeric, string vs string).

Do not use != or <> in WHERE clauses; they disable index usage.

Prefer BETWEEN over IN for continuous numeric ranges.

Paginate large result sets with LIMIT and keep page size reasonable.

4. Engine Choice

MySQL mainly provides two storage engines:

MyISAM : No row‑level locking, no transactions, no foreign keys, suitable for read‑intensive workloads.

InnoDB : Row‑level locking with MVCC, supports transactions and foreign keys, ideal for write‑intensive workloads. Since MySQL 5.6.4 it also supports full‑text indexes.

Generally, use MyISAM for SELECT‑heavy tables and InnoDB for INSERT/UPDATE‑heavy tables.

5. System Tuning Parameters

Benchmark tools: sysbench (https://github.com/akopytov/sysbench), iibench-mysql (https://github.com/tmcallaghan/iibench-mysql), tpcc-mysql (https://github.com/Percona-Lab/tpcc-mysql).

back_log : Increase from default 50 to ~500 to allow more pending connections.

wait_timeout : Reduce idle connection timeout from 8 hours to ~30 minutes.

max_user_connection : Set a reasonable upper bound for per‑user connections.

thread_concurrency : Set to roughly 2 × CPU core count.

skip_name_resolve : Disable DNS lookups for client connections (use IP addresses).

key_buffer_size : For MyISAM, increase to 256 MiB–384 MiB on a 4 GiB server; keep key_reads/key_read_requests < 0.1 %.

innodb_buffer_pool_size : Allocate the majority of memory (e.g., 70‑80 %) for InnoDB data and index caching.

innodb_additional_mem_pool_size : Adjust when many tables/objects exist.

innodb_log_buffer_size : Keep ≤ 32 MiB.

query_cache_size : Usually disable; if used, size ≤ 256 MiB and monitor hit rate.

read_buffer_size , sort_buffer_size , read_rnd_buffer_size : Tune based on workload (larger values improve sequential scans and ORDER BY performance).

thread_cache_size : Cache idle threads to reduce connection overhead.

table_cache : Mostly benefits MyISAM; set appropriately.

6. Hardware Scaling

Scale‑up by adding CPU cores, more RAM, and SSD storage. Identify whether MySQL is CPU‑bound or I/O‑bound and upgrade the bottleneck accordingly.

7. Read/Write Splitting

Deploy a master‑slave topology: writes go to the master, reads are distributed among slaves. Avoid dual‑master setups unless you can handle the added complexity.

8. Caching Layers

Caching can be applied at multiple levels:

MySQL internal : Tune buffer sizes as described above.

Data access layer : MyBatis cache, Hibernate second‑level cache, etc.

Application service layer : Cache Data Transfer Objects (DTOs).

Web layer : HTTP response caching.

Client/browser : Browser cache.

Two common write‑through strategies:

Write‑Through : Update cache immediately after writing to the DB (simple, consistent).

Write‑Back : Write only to cache and flush asynchronously to the DB (higher throughput, more complex).

9. Table Partitioning

MySQL 5.1 introduced native partitioning, which transparently splits a logical table into multiple physical sub‑tables.

Benefits : Store more data, easier bulk delete, faster queries that target few partitions, ability to place partitions on different disks, avoid certain bottlenecks, and enable per‑partition backup/restore.

Limitations : Max 1024 partitions, all primary/unique keys must include the partition column, no foreign keys, NULL values break partition pruning, and all partitions must use the same engine.

Partition types : RANGE, LIST, HASH, KEY.

Typical use case: time‑series data partitioned by year or month.

CREATE TABLE members (
  firstname VARCHAR(25) NOT NULL,
  lastname  VARCHAR(25) NOT NULL,
  username  VARCHAR(16) NOT NULL,
  email     VARCHAR(35),
  joined    DATE NOT NULL
) PARTITION BY RANGE ( YEAR(joined) ) (
  PARTITION p0 VALUES LESS THAN (1960),
  PARTITION p1 VALUES LESS THAN (1970),
  PARTITION p2 VALUES LESS THAN (1980),
  PARTITION p3 VALUES LESS THAN (1990),
  PARTITION p4 VALUES LESS THAN MAXVALUE
);

When querying a partitioned table, include the partition column in the WHERE clause to ensure the optimizer scans only relevant partitions. Example:

mysql> EXPLAIN PARTITIONS SELECT COUNT(1) FROM user_partition WHERE id IN (1,2,3,4,5);
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table          | partitions | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | user_partition | p1,p4      | range | PRIMARY       | PRIMARY | 8       | NULL |   5  | Using where; Using index |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

10. Vertical Sharding

Separate logically unrelated groups of tables into different databases (e.g., user data vs. order data). Vertical table splitting moves rarely‑changed columns to a separate table, reducing row size and improving cache utilization.

Pros: Smaller rows, better cache usage, simpler maintenance.

Cons: Redundant primary keys, additional JOIN overhead, still limited by single‑table size, more complex transaction handling.

11. Horizontal Sharding

Distribute rows across multiple databases or tables based on a sharding key (range, list, hash, etc.). This achieves true distribution of load and storage.

Advantages: Eliminates single‑node bottlenecks, minimal application changes, improves stability and capacity.

Disadvantages: Distributed transaction consistency is hard, cross‑node JOIN performance suffers, operational complexity increases.

12. Sharding Principles

Only shard when necessary; start with single‑table optimizations.

Keep shard count low and distribute evenly across nodes.

Choose sharding rules based on data growth, access patterns, and future scalability (range, list, consistent‑hash).

Avoid cross‑shard transactions; distributed transactions are difficult.

Optimize queries to avoid SELECT * and large result sets.

Use data redundancy and partitioning to reduce cross‑shard joins.

Time‑based data (e.g., logs, orders) often benefits from range sharding on a timestamp column.

13. Sharding Solutions

Two main architectural approaches:

Client‑side architecture : Application modifies its data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources directly.

Pros: No extra middleware, lower external failure risk, low integration cost.

Cons: Limited to DB‑access layer, less scalable for complex scenarios, adds load to application servers.

Proxy (middleware) architecture : Deploy an independent proxy that routes queries to the appropriate shards, transparent to the application.

Pros: Handles complex routing, high extensibility, no changes to application code.

Cons: Requires additional deployment and operational overhead, adds a network hop and potential latency.

When choosing a solution, consider scale, feature requirements (e.g., cross‑node ORDER BY), maintenance activity, vendor reputation, open‑source availability, and community support.

Author: 请叫我头头哥 (source: www.cnblogs.com/toutou/p/9183795.html). For further reading, see the linked articles at the end of the original post.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Indexing Sharding Caching mysql Database Tuning System Parameters

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.