Why MySQL count(*) Slows Down on Large Tables and How to Optimize It
This article explains how MySQL's count() works across different storage engines, compares the performance of various count() forms, and offers practical strategies—such as using EXPLAIN rows, auxiliary count tables, and batch processing—to obtain accurate or approximate row counts efficiently even on massive tables.
Creating an SMS Table
Assume a table sms that stores messages to be sent, with a state column where
0means unsent.
An asynchronous thread repeatedly fetches rows with
state = 0, sends the SMS, and updates
stateto
1, gradually reducing the number of unsent messages.
Why count() Becomes Slow on Large Tables
To monitor how many unsent messages remain, we typically run:
<code>select count(*) from sms where state = 0;</code>When the table is small this works fine, but with millions of rows the query may time out.
How count() Works
MySQL consists of a server layer and a storage‑engine layer. The storage engine (e.g., InnoDB or MyISAM) determines how the row count is obtained.
MyISAM keeps a hidden counter that can be read directly, making
count(*)extremely fast. InnoDB, however, must traverse the smallest index tree and count leaf nodes, which requires scanning many rows.
Why InnoDB Cannot Use a Simple Counter
InnoDB supports transactions and MVCC with the default REPEATABLE READ isolation level. Because each transaction sees a consistent snapshot, the visible row count can differ between concurrent transactions, so a single stored counter would break isolation guarantees.
Performance of Different count() Forms
The server layer always receives rows from InnoDB and then decides whether to count them based on the expression inside
count():
count(*) and count(1) simply increment for every row because the value can never be NULL.
count(primary_key) also skips NULL checks because primary keys are NOT NULL.
count(indexed_column) may need to read the column value and check for NULL unless the column is defined NOT NULL.
count(non_indexed_column) usually triggers a full‑table scan and NULL checks.
<code>count(*) ≈ count(1) > count(primary_key) > count(indexed_column) > count(non_indexed_column)</code>Thus
count(*)is already the fastest option, but it still requires scanning rows in InnoDB.
When Approximate Counts Are Sufficient
If you only need to know the order of magnitude of unsent messages, you can use the
rowsestimate from
EXPLAIN, which samples data and provides a quick approximation.
When Exact Counts Are Required
For precise monitoring you can maintain a separate table that stores pre‑computed counts:
<code>CREATE TABLE `count_table` (
`id` int NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
`cnt_what` char(20) NOT NULL DEFAULT '' COMMENT 'Metric name',
`cnt` tinyint NOT NULL COMMENT 'Count value',
PRIMARY KEY (`id`),
KEY `idx_cnt_what` (`cnt_what`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;</code>Insert or update the count whenever the number of unsent SMS changes, then read it instantly:
<code>select cnt from count_table where cnt_what = "未发送的短信数量";</code>High‑Real‑Time Scenario
Include the count‑updating SQL in the same transaction that inserts or deletes SMS rows. This keeps the count consistent with transaction isolation, but may cause contention under heavy write load.
Low‑Real‑Time Scenario
If updates can be delayed (e.g., once a day), you can periodically scan the SMS table in batches, count rows with
state = 0, and store the result in
count_table. Alternatively, stream binlog changes to Hive and compute the count there.
Summary
MySQL count() reads full‑table data; MyISAM can read a stored row‑count field, while InnoDB must scan rows.
Performance order:
count(*) ≈ count(1) > count(primary_key) > count(indexed_column) > count(non_indexed_column), but even the fastest still scans rows in InnoDB.
For most monitoring use
EXPLAIN's
rowsestimate to get a quick, approximate count.
When exact numbers are needed, maintain a dedicated count table and update it either within the same transaction (high real‑time) or via periodic batch jobs (low real‑time).
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.