Databases 16 min read

Boost MySQL Performance on Large Tables: Proven Strategies and Tools

This article examines why MySQL can still handle massive tables and presents practical techniques—including proper primary keys, eliminating redundant indexes, choosing optimal data types, compression, archiving, partitioning, sharding, query optimization, and safe schema changes—to keep large‑table performance high.

ITPUB

Oct 9, 2023

Boost MySQL Performance on Large Tables: Proven Strategies and Tools

Primary Key

Creating an explicit primary key for every InnoDB table is essential because InnoDB stores rows in primary‑key order and uses the primary key for the clustered index. If no primary key is defined, MySQL picks another unique index or creates an internal hidden clustered index, which is sub‑optimal. Use auto_increment columns when a natural key is unavailable, and from MySQL 8.0.30 you can enable invisible primary keys automatically. Avoid string primary keys because they enlarge secondary indexes.

Redundant Indexes

Indexes that share the same left‑most column can become redundant. In the employees sample database, idx_last_name is a left‑prefix of idx_last_name_hire_date, making the former unnecessary.

db1 employees> show create table employees\G
*************************** 1. row ***************************
Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`),
  KEY `idx_last_name` (`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

ALTER TABLE employees ADD INDEX idx_last_name_hire_date (last_name,hire_date);

[user1] percona@db1: ~ $ pt-duplicate-key-checker -d employees
# idx_last_name is a left‑prefix of idx_last_name_hire_date
# To remove this duplicate index, execute:
ALTER TABLE `employees`.`employees` DROP INDEX `idx_last_name`;

Data Types

Choosing the right column type can dramatically affect storage. A CHAR(200) column always occupies 200 bytes per row, while VARCHAR(200), TINYTEXT, or TEXT store only the actual length of the string.

CREATE TABLE tb1 (id int auto_increment primary key, test_text char(200));
CREATE TABLE tb2 (id int auto_increment primary key, test_text varchar(200));
CREATE TABLE tb3 (id int auto_increment primary key, test_text tinytext);
CREATE TABLE tb4 (id int auto_increment primary key, test_text text);

[user1] percona@db1: ~ $ sudo ls -lh /var/lib/mysql/test/|grep tb
-rw-r-----. 1 mysql mysql 592K Dec 30 02:48 tb1.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb2.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb3.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb4.ibd

Compression

MySQL supports row‑format compression (Zlib/LZ77) for InnoDB tables, reducing disk usage at the cost of CPU for (de)compression. When CPU is not the bottleneck, compression can improve I/O performance.

CREATE TABLE employees_compressed LIKE employees;
ALTER TABLE employees_compressed ROW_FORMAT=COMPRESSED;
INSERT INTO employees_compressed SELECT * FROM employees;

ls -lh /var/lib/mysql/employees/|grep employees
-rw-r-----. 1 mysql mysql 704M Dec 30 02:28 employees.ibd
-rw-r-----. 1 mysql mysql 392M Dec 30 17:19 employees_compressed.ibd

Archival Data

Retain only actively used data in MySQL; archive older rows with pt‑archiver to save storage and backup costs. This approach is recommended for compliance‑driven long‑term retention.

Partitioning

Partitioning splits a large table into smaller sub‑tables based on a partition key (commonly a date). Queries that filter on the partition key read only the relevant partition, dramatically reducing I/O. Choose a partition key that matches the most common query patterns; otherwise, many partitions may need to be scanned, hurting performance.

Sharding

Sharding distributes rows across multiple servers. MySQL does not natively support sharding, but tools such as MySQL NDB Cluster, ProxySQL, and Vitess provide horizontal scaling. Sharding adds architectural complexity and requires a routing layer to direct queries to the correct shard.

Query Optimization

As data grows, poorly written queries become a major bottleneck. Use slow_query_log and pt‑query‑digest to identify and rewrite inefficient statements.

Management

Schema changes on large tables can be disruptive. pt‑online‑schema‑change creates a new table, copies data in the background, and uses triggers to keep the original and new tables in sync, minimizing lock time. For backups, Percona XtraBackup provides hot physical backups with optional compression and encryption. Continuous monitoring with Percona Monitoring and Management (PMM) and its Query Analyzer helps spot performance regressions early.

Conclusion

MySQL is fully capable of handling terabyte‑scale tables when you apply proper schema design, choose appropriate data types, eliminate redundant indexes, use compression or partitioning where beneficial, and leverage open‑source tools for archiving, sharding, query analysis, and online schema changes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing MySQL compression partitioning

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.