What Batch Size Gives MySQL the Best Insert Performance?
This article explains how MySQL writes data to cache before flushing to disk, compares single‑row versus batch inserts, discusses how hardware limits, transaction size, and lock contention affect performance, and shows how to estimate an optimal batch size with concrete calculations and MyBatis examples.
Introduction
The article starts with an interview scenario that highlights a common misconception: developers often claim they can insert millions of rows per batch without understanding the underlying trade‑offs.
👨 Interviewer: How do you handle large data loads? 👦 Candidate: We use batch inserts, sometimes over 20 million rows. 👨 Interviewer: Why 20 million? Have you considered hardware, I/O, or transaction impact?
Fundamentals of Database Insertion
1.1 How Insertion Works
When a row is inserted, MySQL first writes it to the InnoDB buffer pool (RAM) and later flushes the buffer to disk. This reduces the number of expensive disk I/O operations.
Speed difference: RAM access is orders of magnitude faster than disk.
Disk I/O cost: Each small write incurs overhead; batching reduces this cost.
Write merging: Accumulating many writes in memory allows a single large disk write.
👨: What if the server crashes before the dirty pages are flushed? 👦: InnoDB uses Write‑Ahead Logging (redo log) so changes are recoverable.
1.2 Transaction Log and Persistence
Changes are first recorded in the redo log; only after the log is safely written does MySQL move data to the actual table pages.
1.3 Single‑Row vs. Batch Inserts
Inserting 1 000 rows one by one creates 1 000 separate transactions, each with its own overhead. A batch insert can wrap all rows in a single transaction, dramatically reducing overhead but adding complexity in validation and error handling.
While batches improve throughput, overly large batches can block other operations and increase latency.
How to Determine an Appropriate Batch Size
Choosing the right batch size requires balancing hardware limits, transaction size, and lock contention.
2.1 Hardware and System Resources
Disk I/O: Excessive inserts can saturate disk bandwidth, hurting response time. Monitor I/O and keep insert load below peak capacity.
Memory usage: Large batches consume RAM; if memory is exhausted, performance degrades or the process may crash. Regularly check memory headroom.
2.2 Database Internal Mechanisms
Transaction size: Bigger transactions hold locks longer, affecting concurrent queries. Find a balance where the transaction is large enough to reduce overhead but small enough to avoid long lock periods.
Lock strategy: High lock contention can throttle performance; tuning lock granularity and isolation levels helps.
2.3 Estimating the Batch Size
Assume a record structure:
int field: 4 bytes
varchar (average 50 bytes, max 255 bytes)
date field: 3 bytes
float field: 4 bytes
Average record size ≈ 61 bytes (illustrated in the first image). Considering the maximum varchar length yields a larger size (second image).
Given an 8 GB memory pool with 20 % reserved for the OS, usable memory ≈ 6.4 GB (third image). Dividing usable memory by average record size gives an upper bound on the number of rows that can be held in memory before flushing.
With a 512 GB SSD, the disk‑capacity‑based maximum row count is shown in the fourth image.
Practical Strategies with MyBatis
To apply batch inserts in MyBatis, the <foreach> tag can generate a bulk INSERT statement:
<insert id="insertMultiple" parameterType="list">
INSERT INTO tableName (column1, column2, ...)
VALUES
<foreach collection="list" item="record" separator=",">
(#{record.column1}, #{record.column2}, ...)
</foreach>
</insert>MyBatis also supports ExecutorType.BATCH to accumulate statements and execute them in one batch:
SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);Setting an appropriate batchSize prevents Out‑Of‑Memory errors. It is recommended to commit the session only after all rows have been inserted, avoiding frequent commits that degrade performance.
Conclusion
The article provides a step‑by‑step framework for estimating a safe batch size based on record size, available memory, and disk capacity, and demonstrates how to implement efficient batch inserts with MyBatis. Understanding the trade‑offs between transaction size, lock contention, and hardware limits enables developers to choose a batch size that maximizes throughput without compromising stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
