Efficiently Export Millions of Rows to Excel Using Async and EasyExcel
This article details a robust solution for exporting massive MySQL datasets—potentially millions of rows—to Excel by leveraging asynchronous processing with jobs or MQ, the memory‑efficient EasyExcel library, pagination, multi‑sheet handling, OSS storage, and WebSocket notifications, ensuring performance and usability.
Preface
Recently I implemented a MySQL million‑level data excel export feature that is now live. The requirement is simple: users click a Export All button on the UI to download all product data, but the record count can reach one or two million, raising several challenges.
If data is exported synchronously, the API may timeout.
Loading all data into memory can cause OOM.
Large SQL queries become slow.
Rows with the same product code must be grouped together.
When using async processing, how to notify users of the export result?
If the generated Excel file is too large, the target user may not be able to open it.
How can we solve these problems to achieve fast million‑level Excel export?
1. Asynchronous Processing
Exporting millions of rows via a synchronous API will inevitably timeout. Therefore, the first design choice should be an asynchronous interface.
Asynchronous processing can be implemented with a thread, a thread pool, a job, or a message queue (MQ).
To avoid data loss on service restart, we usually use a job or mq for async execution.
1.1 Using Job
When using a job, add an execution task table to record each export task.
When the user clicks the export button, a backend API writes a record with status Pending to the table.
A job runs periodically (e.g., every 5 minutes), scans the table for Pending records, and processes them one by one.
To prevent duplicate execution, the task status is updated to Running when the job starts and to Completed or Failed when it finishes.
Task is initially Pending.
When the job first processes the task, status changes to Running.
After the job finishes, status becomes Completed or Failed.
This state machine ensures that a task already in Running will not be picked up again in the next job cycle.
Job execution has a slight delay, making it suitable for non‑time‑critical scenarios.
1.2 Using MQ
The export button sends a message to an mq server. A dedicated mq consumer processes the message and generates the Excel file.
Compared with the job approach, MQ offers better real‑time performance.
If the consumer fails, a compensation mechanism can trigger automatic retries. For example, RocketMQ provides built‑in retry and dead‑letter queue handling when the retry count exceeds a threshold.
2. Using EasyExcel
In Java, common Excel libraries are Apache POI and JXL, both of which are memory‑intensive. POI’s SAX mode reduces memory usage but still loads large parts of the file into memory, leading to OOM for million‑level data.
EasyExcel rewrites POI’s handling of the 2007‑format Excel, reducing memory consumption from ~100 MB to a few MB for a 3 MB file, and avoids OOM even for larger files.
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>easyexcel</artifactId>
<version>3.0.2</version>
</dependency>Reading Excel with EasyExcel:
@Test
public void simpleRead() {
String fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
// Read the first sheet; the file stream is automatically closed
EasyExcel.read(fileName, DemoData.class, new DemoDataListener())
.sheet()
.doRead();
}Writing Excel with EasyExcel:
@Test
public void simpleWrite() {
String fileName = TestFileUtil.getPath() + "write" + System.currentTimeMillis() + ".xlsx";
// Write to the first sheet named "Template"; the file stream is automatically closed
EasyExcel.write(fileName, DemoData.class)
.sheet("Template")
.doWrite(data());
}EasyExcel reduces memory usage by streaming rows from disk instead of loading the entire file into memory.
3. Pagination Query
Fetching all rows at once is time‑consuming and can cause OOM. Therefore, we use pagination, e.g., 5,000 rows per page, resulting in 200 pages for one million rows.
public Page<User> searchUser(SearchModel searchModel) {
List<User> userList = userMapper.searchUser(searchModel);
Page<User> pageResponse = Page.create(userList, searchModel);
pageResponse.setTotal(userMapper.searchUserCount(searchModel));
return pageResponse;
}The MyBatis SQL uses limit #{pageStart}, #{pageSize}, where pageStart = (pageNo - 1) * pageSize.
4. Multiple Sheets
Excel limits a sheet to 1,048,576 rows. For data exceeding this, we split the output across multiple sheets.
5. Calculating Limit Start Position for Multiple Sheets
When using multiple sheets, we must recalculate the limit start position for each sheet.
ExcelWriter excelWriter = EasyExcelFactory.write(out).build();
int totalPage = searchUserTotalPage(searchModel);
if (totalPage > 0) {
Page<User> page = Page.create(searchModel);
int sheet = (totalPage % maxSheetCount == 0) ? totalPage / maxSheetCount : (totalPage / maxSheetCount) + 1;
for (int i = 0; i < sheet; i++) {
WriterSheet writeSheet = buildSheet(i, "sheet" + i);
int startPageNo = i * (maxSheetCount / pageSize) + 1;
int endPageNo = (i + 1) * (maxSheetCount / pageSize);
while (page.getPageNo() >= startPageNo && page.getPageNo() <= endPageNo) {
page = searchUser(searchModel);
if (CollectionUtils.isEmpty(page.getList())) {
break;
}
excelWriter.write(page.getList(), writeSheet);
page.setPageNo(page.getPageNo() + 1);
}
}
}This logic writes each page of data to the appropriate sheet.
6. Uploading the File to OSS
Because the export runs asynchronously, the generated Excel file is first uploaded to an Object Storage Service (OSS). The OSS returns the file name and access URL, which are stored in a database table for later retrieval via a browser.
Storing the file on the application server would consume significant disk space, so separating the application server from the file server is recommended.
7. Real‑Time Notification via WebSocket
After the Excel file is uploaded to OSS, we need to notify the user of success or failure. Instead of a static page prompt, we can push a real‑time notification using WebSocket.
Spring Boot provides a starter for WebSocket:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-websocket</artifactId>
</dependency>A notification table records the title, user, attachment URL, read status, and type of each WebSocket message, allowing users to see a pop‑up with a download link and mark the notification as read.
8. Configurable Total Record Count
The total number of records to export should be configurable. Users can adjust the total based on whether their local machine can handle the file size (e.g., limit to 500,000 rows).
Both the total record count and page size should be configurable to dynamically control the amount of data exported.
9. Ordering by Product Code
To ensure rows with the same product code appear together across different warehouses, we add an order by product_code clause in the SQL. When pagination splits the data, the last page may contain an incomplete product group, so we need to handle that edge case in code.
Complex joins can degrade performance, so careful indexing and query optimization are required.
Conclusion
The overall flow for asynchronous Excel export can use either MQ or Job. Both approaches are viable; choose based on the specific scenario.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
