How to Efficiently Import 100 Million Excel Rows into MySQL
This article explains how to import a hundred‑million‑row Excel dataset into MySQL by using CSV format, streaming parsers like EasyExcel, batch inserts, asynchronous processing, and partial‑transaction strategies to ensure feasibility, data integrity, and high performance.
Interviewers often ask how to handle the massive task of moving one hundred million rows from Excel into a MySQL database. While the XLSX format caps at about one million rows, a CSV file has no practical row limit, making it a viable source for such large data sets.
Feasibility
Loading all rows with a conventional Excel parser such as Apache POI would require reading the entire file into memory, which inevitably triggers an Out‑Of‑Memory error in the JVM. The solution is to use a streaming‑capable library like EasyExcel , which reads and processes the file line‑by‑line and releases memory immediately, dramatically reducing JVM consumption.
Data Integrity
When importing such a volume, it is essential to capture any rows that fail to insert and record the specific error reasons. This traceability allows developers to analyse failures, correct problematic records, and re‑attempt insertion, thereby preserving overall data completeness.
Performance Optimisation
Switching from single‑row inserts to batch inserts consolidates network traffic, SQL parsing, execution‑plan generation, and disk I/O into a single operation, yielding significant speed gains. Further performance can be extracted by:
Employing a thread‑pool to execute batch writes asynchronously, fully utilising server CPU cores.
Using CompletableFuture to validate rows in parallel before the write phase, replacing a serial validation bottleneck.
Transaction Considerations
Although transactions guarantee atomicity, rolling back an entire batch because a single row is erroneous would discard thousands of valid records. In this scenario, it is preferable to commit the successful rows and isolate the failures, rather than enforce an all‑or‑nothing approach.
By combining streaming reads, batch writes, asynchronous processing, and selective transaction handling, developers can reliably import massive Excel‑derived datasets into MySQL with acceptable memory usage, high throughput, and robust error reporting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Tony
Former senior tech manager at Meituan, ex‑tech director at New Oriental, with experience at JD.com and Qunar; specializes in Java interview coaching and regularly shares hardcore technical content. Runs a video channel of the same name.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
