Databases 15 min read

How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s with Batch, RewriteBatchedStatements, and Multithreading

This article walks through optimizing a 60,000‑row XML‑to‑MySQL import by profiling the environment, measuring baseline performance, and applying JDBC batch rewriting, write aggregation, and asynchronous writes with LMAX Disruptor, ultimately reducing execution time from 300 seconds to about four seconds while keeping memory usage reasonable.

Top Architect
Top Architect
Top Architect
How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s with Batch, RewriteBatchedStatements, and Multithreading

Environment

Java code runs on a laptop (8‑core i9, 16 GB RAM) while MySQL runs in a VirtualBox VM (4 CPU, 4 GB RAM). JDK 21 and MySQL 8.0 are used. Reading a record from the XML takes 0.08 s, inserting a record into MySQL takes 0.5 s.

Baseline Implementation

The initial procedural implementation parses the XML, builds a list of Product objects, and inserts each record individually. The whole process takes about 300 seconds and peaks at 656 MB of memory.

void importData() {
    Document doc = parse XML file;
    List<Product> products = extract data from doc;
    for (Product p : products) {
        // insert into MySQL
    }
}

Optimization Direction

The biggest gain comes from reducing MySQL write cost. Two main techniques are write aggregation (batch) and asynchronous writes.

Enable MySQL Batch Processing

JDBC batch API can group many inserts into a single round‑trip. The driver must be configured with rewriteBatchedStatements=true for true batch rewriting.

Connection conn = ...;
PreparedStatement stmt = conn.prepareStatement(sql);
conn.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
    stmt.setString(...);
    stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

Key methods: addBatch(), executeBatch(), clearBatch(). After enabling, write time drops from 300 s to 9 s, and memory remains stable.

Asynchronous Writing with Disruptor

Using the LMAX Disruptor, XML parsing produces events that are placed into a ring buffer. Multiple consumer threads consume events, batch‑insert into MySQL, and commit.

var disruptor = new Disruptor<>(ProductEvent::new, 16384,
    DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
    new BusySpinWaitStrategy());

var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
    consumers[i] = new SaveDbHandler(i, consumers.length, latch);
}

disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
         .then(new ClearingEventHandler());
RingBuffer<ProductEvent> ring = disruptor.start();

for (Iterator<Element> it = document.getRootElement().elementIterator(); it.hasNext(); ) {
    Element e = it.next();
    if (!StringUtils.hasText(e.elementTextTrim("id"))) continue;
    Product p = ObjectMapper.buildProduct(e);
    ring.publishEvent((event, seq, buf) -> event.setProduct(p));
}

Consumers check sequence % numberOfConsumers == ordinal to avoid duplicate processing. This reduces total time to about 4.5 s (≈60 % faster than batch alone) with higher memory usage (~1 GB).

Further Optimizations

Increase Disruptor batchSize (e.g., 16384) to lower latency, but watch ring‑buffer size.

Use INSERT … ON DUPLICATE KEY UPDATE or REPLACE INTO for idempotent imports.

Adjust MySQL parameters: innodb_log_buffer_size, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit.

Consider SAX parsing for very large XML to reduce memory pressure.

Result

After applying batch processing, rewriteBatchedStatements, and multithreaded asynchronous writes, the import time fell from 300 s to roughly 4 s, with peak memory around 1 GB.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance OptimizationBatch ProcessingmysqlJDBCDisruptor
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.