Backend Development 10 min read

How to Import Millions of Excel Rows in Seconds: 4 Proven Performance Hacks

This article analyzes why traditional Excel import methods crash under massive loads and presents four practical optimization techniques—including streaming parsing, batch inserts, asynchronous processing, and parallel sharding—backed by code samples, configuration tips, and real‑world performance benchmarks for importing millions of rows efficiently.

Su San Talks Tech

Apr 2, 2025

How to Import Millions of Excel Rows in Seconds: 4 Proven Performance Hacks

Introduction

Many developers struggle with importing massive Excel files; a typical e‑commerce system that needs to import 200,000 product rows per day can freeze for over three hours, and a server restart wipes all progress.

1 Why Traditional Import Solutions Fail

1.1 Memory Exhaustion

Problem: POI loads the entire workbook (e.g., UserModel / XSSFWorkbook) into heap memory.

Experiment: A 50 MB file (~200k rows) consumes the default 1 GB heap.

Symptoms: Frequent Full GC, CPU spikes, service unresponsiveness.

1.2 Synchronous Blocking

Process: User uploads → server processes all data synchronously → returns result.

Risk: HTTP timeout (default 30 s) leads to lost tasks.

1.3 Efficiency Black Hole

Measured: MySQL single‑thread insert ≈200 rows/s → 20 万 rows need ~16 minutes.

Root cause: Each INSERT triggers transaction commit, index update, log write.

2 Four Performance Optimizations

2.1 Streaming Parsing

Replace DOM parsing with POI’s SAX mode to read the file piece by piece.

// Correct example: segment reading (HSSF example)
OPCPackage pkg = OPCPackage.open(file);
XSSFReader reader = new XSSFReader(pkg);
SheetIterator sheets = (SheetIterator) reader.getSheetsData();
while (sheets.hasNext()) {
    try (InputStream stream = sheets.next()) {
        Sheet sheet = new XSSFSheet(); // streaming parse
        RowHandler rowHandler = new RowHandler();
        sheet.onRow(row -> rowHandler.process(row));
        sheet.process(stream); // do not load full data
    }
}

Pitfall guide:

Adapt to different Excel versions (HSSF/XSSF/SXSSF).

Avoid creating many objects during parsing; reuse containers.

2.2 Paginated Batch Inserts

Use MyBatis batch insert with connection‑pool tuning.

// Paginated batch insert (commit every 1000 rows)
public void batchInsert(List<Product> list) {
    SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
    ProductMapper mapper = sqlSession.getMapper(ProductMapper.class);
    int pageSize = 1000;
    for (int i = 0; i < list.size(); i += pageSize) {
        List<Product> subList = list.subList(i, Math.min(i + pageSize, list.size()));
        mapper.batchInsert(subList);
        sqlSession.commit();
        sqlSession.clearCache(); // clear cache
    }
}

Key parameters:

# MyBatis configuration
mybatis.executor.batch.size=1000
# Druid connection pool
spring.datasource.druid.maxActive=50
spring.datasource.druid.initialSize=10

2.3 Asynchronous Processing

Architecture diagram:

Frontend upload: Use chunked upload tools (e.g., WebUploader).

Server side: Generate a unique task ID and push the task into a queue (Redis Stream / RabbitMQ).

Async thread pool: Multiple workers consume the queue; progress stored in Redis.

Result notification: Notify client via WebSocket or email.

2.4 Parallel Import

For tens of millions of rows, apply a divide‑and‑conquer strategy:

Single‑thread: row‑by‑row read + insert (baseline 100%).

Paginated batch: time reduced to 5%.

Multi‑thread sharding: time reduced to 1%.

Distributed sharding (3 nodes): time reduced to 0.5%.

3 Key Experience Beyond Code

3.1 Pre‑validation

Wrong approach – validate while inserting, which may pollute the database:

// Wrong: validate while inserting, may corrupt DB
public void validateAndInsert(Product product) {
    if (product.getPrice() < 0) {
        throw new Exception("Price cannot be negative");
    }
    productMapper.insert(product);
}

Correct practice:

Perform basic format and required‑field checks during streaming parsing.

Do business validation (referential integrity, uniqueness) before persisting.

3.2 Checkpoint‑Resume Design

Record processing status of each chunk.

On failure, resume from the last offset.

3.3 Logging & Monitoring

Example Spring Boot Prometheus metrics configuration:

// Spring Boot Prometheus metric bean
@Bean
public MeterRegistryCustomizer<PrometheusMeterRegistry> metrics() {
    return registry -> registry.config().meterFilter(new MeterFilter() {
        @Override
        public DistributionStatisticConfig configure(Meter.Id id, DistributionStatisticConfig config) {
            return DistributionStatisticConfig.builder()
                    .percentiles(0.5, 0.95) // median and 95th percentile
                    .build().merge(config);
        }
    });
}

4 Million‑Row Import Performance Comparison

Test environment: 4‑core 8 GB server, MySQL 8.0, 100 万 rows × 15 columns (~200 MB Excel).

Results:

Traditional row‑by‑row: 2.5 GB peak memory, 96 min, 173 rows/s.

Paginated batch: 500 MB, 7 min, 2 381 rows/s.

Multi‑thread sharding + async batch: 800 MB, 86 s, 11 627 rows/s.

Distributed sharding (3 nodes): 300 MB per node, 29 s, 34 482 rows/s.

Conclusion

Never load the whole file into memory: Use SAX streaming.

Avoid row‑by‑row DB operations: Leverage batch inserts.

Never make users wait: Process asynchronously with progress queries.

Horizontal scaling beats vertical tuning: Sharding and distributed processing.

Memory management is critical: Object pooling, avoid large temporary objects.

Tune connection‑pool parameters: Prevent datasource bottlenecks.

Pre‑validation is non‑negotiable: Filter dirty data at the entry point.

Comprehensive monitoring: Full‑link metrics.

Design for disaster recovery: Checkpoint‑resume and idempotent handling.

Discard single‑machine mindset: Embrace distributed system design.

Stress test extreme scenarios: Million‑row load tests are essential.

If you are frustrated by Excel import performance, the techniques above should open a new door for your system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java Batch Processing Spring Excel Import

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.