Why MySQL COUNT(*) Is Slow on InnoDB but Instant on MyISAM – A Deep Dive
This article explains the mechanisms behind MySQL's SELECT COUNT(*) operation, comparing InnoDB's row‑by‑row scan with MyISAM's constant‑time meta count, and explores execution steps, visibility rules, data structures, and performance implications.
InnoDB Full‑Table COUNT(*)
"SELECT COUNT(*) FROM TABLE" is a ubiquitous SQL query. When using the InnoDB storage engine, which is the default for most business tables, the COUNT(*) operation has a time complexity of O(N), where N is the number of rows, because it must scan the entire table.
In contrast, MyISAM can retrieve the row count quickly. This article investigates the underlying mechanisms and reasons for this difference.
Main Questions
How does the execution process work?
How is the count calculated and what factors affect the result?
Where is the count value stored and what data structures are involved?
Why can InnoDB only implement COUNT(*) by scanning the table?
What risks does a full‑table COUNT(*) pose as a table‑scan case?
Does COUNT(*) read overflow pages like a SELECT * query?
1. Execution Framework – Loop: Read + Count?
1.1 Basic Conclusion
Full‑table scan is performed in a single loop.
Inside the loop: read one row, then decide whether it contributes to the count.
The loop processes rows one by one.
Simple SELECT‑SQL execution can be likened to INSERT INTO … SELECT.
2. Execution Process
The process consists of four parts:
Pre‑processing: client sends the SQL to the MySQL server before SELECT execution.
COUNT(*) flow: code‑level framework and two core steps with key call‑stack details.
Read a row: visibility handling and row_search_mvcc function, which determines how MVCC affects the COUNT(*) result.
Count a row: evaluate_join_record decides whether the row increments the counter.
Readers can skip part (1) and start directly at part (2) if they only want the COUNT(*) details.
2.1 COUNT(*) Pre‑processing – From Client to sub_select
The client packages the SQL according to the MySQL protocol and sends it to the server.
The server parses the packet, identifies the command type (QUERY) and extracts the SQL string.
The parser converts the statement into a JOIN object that represents the query structure, including table list, target list, WHERE clause, and subqueries.
In a full‑table COUNT(*) case, table_list = [t] and target_list = [COUNT(*)], with no WHERE or subqueries.
The JOIN object provides two important methods: JOIN::optimize() (optimization phase) and JOIN::exec() (execution phase). The execution phase ultimately calls sub_select to perform the simple SELECT, including COUNT(*).
2.2 COUNT(*) Flow Inside sub_select
Reading a row: all code paths eventually invoke row_search_mvcc, which reads a row from the InnoDB B+‑tree into a buffer, handling row locks, MVCC, and visibility. For a snapshot read like SELECT COUNT(*), only MVCC and visibility matter.
Counting a row: evaluate_join_record evaluates each fetched row to decide if it should be counted. COUNT(arg) increments the counter if the argument is not NULL; otherwise the row is ignored.
The two filtering stages are illustrated below:
Relevant source code excerpt:
2.3 Row Visibility and row_search_mvcc
Visibility determines which rows are seen by a transaction. Even a MIN(id) query may not read the physically smallest row if it is invisible under the current MVCC snapshot.
In Read‑Uncommitted isolation, a concurrent insert can become visible during the scan, so the COUNT(*) may include newly inserted rows.
2.4 evaluate_join_record and NULL Checks
A row contributes to COUNT when:
If the COUNT argument is a column, the column must be NOT NULL and its value must not be NULL.
If the argument is *, the entire row must be non‑NULL (which is always true for regular rows).
Thus COUNT(id) on a primary‑key column is equivalent to COUNT(*).
Data Structure
The count value is stored in the expression object representing COUNT(*): ((Item_sum_count*)item_sum)->count. After parsing, the SQL creates a JOIN object containing a result_field_list. For COUNT(*), this list holds a single Item_sum_count object whose count member holds the result.
MyISAM Full‑Table COUNT(*)
MyISAM is rarely used in production, but its COUNT(*) works in O(1) time because each MyISAM table stores a meta count value both in memory and on disk.
The server reads the in‑memory count variable, which is initialized from the file’s count value.
Updates to the count are protected by a table‑level lock, ensuring consistency.
Key Differences Between InnoDB and MyISAM
Both engines share the same SQL‑layer data structures; the count variable resides in the same Item_sum_count object.
InnoDB computes the count during execution by scanning rows, while MyISAM retrieves a pre‑maintained row count during optimization, avoiding a scan.
Why InnoDB Cannot Keep a Global Row‑Count Variable
Because MVCC allows each transaction to see a different snapshot of the data, a single global row‑count would be inaccurate. The server cannot provide a unified view for all concurrent sessions.
Impact on Buffer Pool
InnoDB places pages loaded for a table‑scan near the tail of the LRU list, preserving hot pages in the young region and evicting older, less‑used pages, which mitigates interference with other workloads.
Does COUNT(*) Read Overflow Pages?
No. Since COUNT(*) only needs to count rows and primary‑key values are never NULL, InnoDB reads only the index pages containing the primary keys, avoiding large‑field overflow pages.
Original source: https://blog.didiyun.com/index.php/2019/01/08/mysql-count
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
