Boost MySQL Performance on Large Tables: Proven Strategies and Tools
This article examines why MySQL can still handle massive tables and presents practical techniques—including proper primary keys, eliminating redundant indexes, choosing optimal data types, compression, archiving, partitioning, sharding, query optimization, and safe schema changes—to keep large‑table performance high.
Primary Key
Creating an explicit primary key for every InnoDB table is essential because InnoDB stores rows in primary‑key order and uses the primary key for the clustered index. If no primary key is defined, MySQL picks another unique index or creates an internal hidden clustered index, which is sub‑optimal. Use auto_increment columns when a natural key is unavailable, and from MySQL 8.0.30 you can enable invisible primary keys automatically. Avoid string primary keys because they enlarge secondary indexes.
Redundant Indexes
Indexes that share the same left‑most column can become redundant. In the employees sample database, idx_last_name is a left‑prefix of idx_last_name_hire_date, making the former unnecessary.
db1 employees> show create table employees\G
*************************** 1. row ***************************
Table: employees
Create Table: CREATE TABLE `employees` (
`emp_no` int NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`),
KEY `idx_last_name` (`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ALTER TABLE employees ADD INDEX idx_last_name_hire_date (last_name,hire_date); [user1] percona@db1: ~ $ pt-duplicate-key-checker -d employees
# idx_last_name is a left‑prefix of idx_last_name_hire_date
# To remove this duplicate index, execute:
ALTER TABLE `employees`.`employees` DROP INDEX `idx_last_name`;Data Types
Choosing the right column type can dramatically affect storage. A CHAR(200) column always occupies 200 bytes per row, while VARCHAR(200), TINYTEXT, or TEXT store only the actual length of the string.
CREATE TABLE tb1 (id int auto_increment primary key, test_text char(200));
CREATE TABLE tb2 (id int auto_increment primary key, test_text varchar(200));
CREATE TABLE tb3 (id int auto_increment primary key, test_text tinytext);
CREATE TABLE tb4 (id int auto_increment primary key, test_text text); [user1] percona@db1: ~ $ sudo ls -lh /var/lib/mysql/test/|grep tb
-rw-r-----. 1 mysql mysql 592K Dec 30 02:48 tb1.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb2.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb3.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb4.ibdCompression
MySQL supports row‑format compression (Zlib/LZ77) for InnoDB tables, reducing disk usage at the cost of CPU for (de)compression. When CPU is not the bottleneck, compression can improve I/O performance.
CREATE TABLE employees_compressed LIKE employees;
ALTER TABLE employees_compressed ROW_FORMAT=COMPRESSED;
INSERT INTO employees_compressed SELECT * FROM employees; ls -lh /var/lib/mysql/employees/|grep employees
-rw-r-----. 1 mysql mysql 704M Dec 30 02:28 employees.ibd
-rw-r-----. 1 mysql mysql 392M Dec 30 17:19 employees_compressed.ibdArchival Data
Retain only actively used data in MySQL; archive older rows with pt‑archiver to save storage and backup costs. This approach is recommended for compliance‑driven long‑term retention.
Partitioning
Partitioning splits a large table into smaller sub‑tables based on a partition key (commonly a date). Queries that filter on the partition key read only the relevant partition, dramatically reducing I/O. Choose a partition key that matches the most common query patterns; otherwise, many partitions may need to be scanned, hurting performance.
Sharding
Sharding distributes rows across multiple servers. MySQL does not natively support sharding, but tools such as MySQL NDB Cluster, ProxySQL, and Vitess provide horizontal scaling. Sharding adds architectural complexity and requires a routing layer to direct queries to the correct shard.
Query Optimization
As data grows, poorly written queries become a major bottleneck. Use slow_query_log and pt‑query‑digest to identify and rewrite inefficient statements.
Management
Schema changes on large tables can be disruptive. pt‑online‑schema‑change creates a new table, copies data in the background, and uses triggers to keep the original and new tables in sync, minimizing lock time. For backups, Percona XtraBackup provides hot physical backups with optional compression and encryption. Continuous monitoring with Percona Monitoring and Management (PMM) and its Query Analyzer helps spot performance regressions early.
Conclusion
MySQL is fully capable of handling terabyte‑scale tables when you apply proper schema design, choose appropriate data types, eliminate redundant indexes, use compression or partitioning where beneficial, and leverage open‑source tools for archiving, sharding, query analysis, and online schema changes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
