Backend Development 11 min read

Optimizing XML-to-MySQL Bulk Import with JDBC Batch and Disruptor

By switching from a naïve per‑record insert to JDBC batch writes with rewriteBatchedStatements and then off‑loading those batches to multiple consumer threads via a LMAX Disruptor ring buffer, the XML‑to‑MySQL import of 60,000 rows dropped from roughly 300 seconds to about 4 seconds while keeping memory usage modest.

Java Tech Enthusiast

Feb 2, 2025

Optimizing XML-to-MySQL Bulk Import with JDBC Batch and Disruptor

In a typical data‑import scenario, more than 60,000 records are read from an XML file and inserted into a MySQL database. The original implementation took about 300 seconds, with each XML read costing ~0.08 s and each MySQL insert ~0.5 s.

Environment : Java 21 runs on a laptop (8‑core i9, 16 GB RAM) while MySQL 8.0 runs in a VirtualBox VM (4 CPU, 4 GB RAM). Network latency and VM resource limits affect write speed.

Baseline implementation uses a simple procedural loop:

void importData() {
    Document doc = parse XML file;
    List<Product> products = extract data from doc;
    for (Product p : products) {
        // insert each product into MySQL
    }
}

This approach required ~300 s.

Optimization direction – MySQL batch writes

JDBC Batch API dramatically reduces round‑trip overhead. The key steps are disabling auto‑commit, adding statements to a batch, executing the batch, and committing.

Connection connection = ...;
PreparedStatement stmt = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
    stmt.setString(...);
    stmt.addBatch();
}
stmt.executeBatch();
stmt.clearBatch();
connection.commit();

Enabling the driver option rewriteBatchedStatements=true merges multiple INSERTs into a single statement, further improving throughput.

Result: write time dropped to ~9 s with stable memory usage.

Further speed‑up – asynchronous writes with Disruptor

Using a Disruptor ring buffer, XML parsing produces ProductEvent objects that are handed off to multiple consumer threads, each performing batch inserts.

var disruptor = new Disruptor<>(ProductEvent::new, 16384,
    DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
    new BusySpinWaitStrategy());

var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
    consumers[i] = new SaveDbHandler(i, consumers.length, shutdownLatch);
}

disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
        .then(new ClearingEventHandler());
RingBuffer<ProductEvent> ring = disruptor.start();

for (Iterator<Element> it = document.getRootElement().elementIterator(); it.hasNext(); ) {
    Element e = it.next();
    if (!StringUtils.hasText(e.elementTextTrim("id"))) continue;
    Product p = ObjectMapper.buildProduct(e);
    ring.publishEvent((event, seq, buf) -> event.setProduct(p));
}

With four consumer threads, total time fell to ~4.5 s (≈60 % reduction compared with batch‑only).

Additional refinements

Increase batchSize (e.g., 16 384) to lower per‑batch overhead, while monitoring max_allowed_packet limits.

Switch XML parsing to SAX/event‑driven mode to reduce memory pressure for very large files.

Adjust MySQL settings such as innodb_log_buffer_size, innodb_buffer_pool_size, and innodb_flush_log_at_trx_commit for better write durability/performance trade‑offs.

Final results after combining batch, async, and tuning: ~4 s execution time with ~1 GB peak memory.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance mysql JDBC Batch Disruptor XML

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.