Databases 13 min read

How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s

This article walks through a real‑world case of importing over 60,000 XML rows into MySQL, profiling the initial 300‑second runtime, and applying a series of optimizations—including JDBC batch processing, rewriteBatchedStatements, multithreaded asynchronous writes with Disruptor, and MySQL tuning—to achieve a final import time of around 4 seconds while managing memory usage.

Architect

Jan 12, 2025

How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s

Scenario and Baseline

We needed to import more than 60,000 records from an XML file into MySQL. The Java code ran on an i9‑2.3 GHz notebook, while MySQL was hosted in a VirtualBox VM with 4 CPU cores and 4 GB RAM. Under this setup a single XML record read took 0.08 s and a single MySQL insert took 0.5 s, resulting in a total runtime of about 300 seconds and a peak memory usage of 656 MB.

Baseline Implementation

void importData() {
    Document doc = parseXML();
    List<Product> products = extractProducts(doc);
    for (Product p : products) {
        // insert each product individually
    }
}

The straightforward procedural approach performed the inserts one by one, which is why the whole process took five minutes.

Choosing an Optimization Direction

Profiling showed that MySQL write latency (≈298 seconds) dominated the total time, making write‑side optimization the most cost‑effective path. Two common techniques were considered: write aggregation (batching) and asynchronous writes.

Enabling MySQL Batch Processing

JDBC supports batch execution, but the MySQL driver requires the rewriteBatchedStatements=true flag to actually combine statements. The following code demonstrates the batch pattern:

Connection connection = ...;
PreparedStatement stmt = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
    stmt.setString(...);
    stmt.addBatch();
}
stmt.executeBatch();
connection.commit();

Key JDBC methods are addBatch(), executeBatch() and clearBatch(). After enabling batch processing and the rewrite flag, the import time dropped dramatically to about 9 seconds with stable memory consumption.

Handling Duplicate Data

When re‑importing, duplicate rows can be managed by either deleting the target table beforehand, separating new versus updated rows, or using MySQL’s INSERT … ON DUPLICATE KEY UPDATE syntax (or REPLACE INTO) to achieve idempotent writes.

INSERT INTO t1(a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE a=VALUES(a), b=VALUES(b), c=VALUES(c);

Asynchronous Write with Disruptor

To further reduce latency, we introduced a producer‑consumer model using the LMAX Disruptor library. XML parsing feeds ProductEvent objects into a ring buffer, and four consumer handlers write batches to MySQL. The consumer logic commits the batch only at the end of each batch.

var disruptor = new Disruptor<>(ProductEvent::new, 16384,
    DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
    new BusySpinWaitStrategy());
SaveDbHandler[] consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
    consumers[i] = new SaveDbHandler(i, consumers.length, shutdownLatch);
}
disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
        .then(new ClearingEventHandler());
RingBuffer<ProductEvent> ringBuffer = disruptor.start();
// publish events …

This asynchronous approach lowered the overall runtime to roughly 4.5 seconds (about a 60 % reduction from the batch‑only result) at the cost of higher peak memory (≈1 GB) due to multiple threads.

Further Optimizations

For very large XML files, switching from DOM to SAX parsing reduces object allocation pressure. Increasing the Disruptor batch size from 1,024 to 16,384 cut the runtime to about 3.5 seconds, though it enlarged the ring‑buffer memory footprint (size = object size × batchSize).

MySQL Tuning

Additional MySQL configuration tweaks that can improve write throughput include enlarging innodb_log_buffer_size, increasing innodb_buffer_pool_size, and adjusting innodb_flush_log_at_trx_commit to control log flushing frequency.

Result

Combining batch processing, the rewrite flag, asynchronous Disruptor consumers, and MySQL tuning reduced the XML‑to‑MySQL import from 300 seconds to roughly 4 seconds while keeping memory usage within acceptable limits (≈1 GB).

All code referenced in this article is available at https://github.com/xioshe/xml-to-mysql

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization Batch Processing mysql multithreading Disruptor XML Import

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.