Optimizing XML‑to‑MySQL Data Import: Reducing Execution Time from 300 s to 4 s

This article describes how to dramatically speed up the import of 60,000+ XML records into MySQL by analyzing the original environment, measuring baseline performance, and applying a series of backend optimizations—including MySQL batch mode, rewriteBatchedStatements, multithreaded asynchronous writes with Disruptor, and XML parsing improvements—ultimately cutting the total runtime from five minutes to just a few seconds.

Top Architect
Top Architect
Top Architect
Optimizing XML‑to‑MySQL Data Import: Reducing Execution Time from 300 s to 4 s

The author, a senior architect, encountered a common data‑import scenario: loading more than 60,000 rows from an XML file into MySQL. The initial implementation, a simple procedural Java routine, took about 300 seconds and consumed 656 MB of memory.

Environment : Java 21 on an 8‑core i9 laptop (2.3 GHz, 16 GB RAM) communicating with MySQL 8.0 running inside a VirtualBox VM (4 CPU, 4 GB RAM). Each XML record read took 0.08 s; each MySQL insert took 0.5 s.

Baseline code (simplified):

void importData() {
    Document doc = parseXmlFile();
    List<Product> products = extractProducts(doc);
    for (Product p : products) {
        // insert into MySQL one by one
    }
}

To achieve a dramatic speedup, the author explored several optimization directions.

1. MySQL Batch Processing

Enabling JDBC batch mode and the rewriteBatchedStatements=true connection parameter reduces the number of round‑trips to the server. The core batch code is:

Connection connection = ...;
PreparedStatement stmt = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
    stmt.setString(...);
    stmt.addBatch();
}
stmt.executeBatch();
connection.commit();

Key JDBC methods used are addBatch(), executeBatch() and clearBatch(). After enabling batch processing, the import time dropped to about 9 seconds with similar memory usage.

2. Asynchronous Writing with Disruptor

To further reduce latency, the author introduced a producer‑consumer model using LMAX Disruptor. XML parsing produces ProductEvent objects that are placed into a ring buffer; multiple consumer handlers write batches to MySQL.

var disruptor = new Disruptor<>(
        ProductEvent::new,
        16384,
        DaemonThreadFactory.INSTANCE,
        ProducerType.SINGLE,
        new BusySpinWaitStrategy());

var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
    consumers[i] = new SaveDbHandler(i, consumers.length, shutdownLatch);
}

disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
        .then(new ClearingEventHandler());
RingBuffer<ProductEvent> ringBuffer = disruptor.start();

for (Iterator<Element> it = document.getRootElement().elementIterator(); it.hasNext(); ) {
    Element element = it.next();
    if (!StringUtils.hasText(element.elementTextTrim("id"))) continue;
    Product product = ObjectMapper.buildProduct(element);
    ringBuffer.publishEvent((event, seq, buf) -> event.setProduct(product));
}

When combined with batch mode, the total runtime fell to roughly 4.5 seconds, at the cost of higher peak memory (≈1 GB) due to multiple threads.

3. Further Optimizations

Using SAX (event‑driven) XML parsing to lower object allocation when handling very large files.

Increasing MySQL buffers ( innodb_log_buffer_size, innodb_buffer_pool_size) and adjusting innodb_flush_log_at_trx_commit for better write throughput.

Adjusting max_allowed_packet to accommodate larger batch sizes.

Exploring larger Disruptor batchSize (e.g., 16384) to push runtime down to ~3.5 s, while monitoring ring‑buffer memory impact.

All source code is available on GitHub: https://github.com/xioshe/xml-to-mysql .

In summary, by focusing on MySQL write‑path optimizations—batch statements, rewrite flag, and asynchronous multithreaded consumption—the import duration was reduced from 300 seconds to under 4 seconds, demonstrating a practical performance‑engineering workflow for backend data‑loading tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance OptimizationBatch ProcessingmysqlmultithreadingDisruptorXML parsing
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.