How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s

This article details a step‑by‑step performance overhaul for importing 60,000+ XML records into MySQL, covering baseline measurements, MySQL batch processing, asynchronous writes with Disruptor, XML parsing optimizations, and tuning MySQL buffers, ultimately reducing total runtime from 300 seconds to just four seconds.

Top Architect
Top Architect
Top Architect
How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s

Code Running Environment

Java code runs on a notebook (8‑core i9, 2.3 GHz, 16 GB RAM) while MySQL 8.0 runs in a VirtualBox VM (4 CPU, 4 GB RAM). The JDK version is 21. Under this setup, reading one XML record takes 0.08 s and inserting one row into MySQL takes 0.5 s.

Baseline Performance

The initial procedural implementation processes the XML file, extracts Product objects, and inserts them row‑by‑row. The whole flow takes about 300 seconds and peaks at 656.6 MB memory.

void importData() {
    Document doc = parse XML file;
    List<Product> products = extract data from doc;
    for (Product p : products) {
        insert into MySQL;
    }
}

Where to Start Optimizing?

Improving MySQL write performance offers the highest ROI: the original 298.3 seconds spent on inserts is a large, untapped optimization space.

Enable MySQL Batch Processing

JDBC batch API can combine many INSERT statements into a single request. The key is to enable rewriteBatchedStatements=true in the MySQL JDBC URL; otherwise the driver falls back to single‑row execution.

Connection connection = ...;
PreparedStatement stmt = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
    stmt.setString(...);
    stmt.addBatch();
}
stmt.executeBatch();
stmt.clearBatch();
connection.commit();

statement.addBatch() – adds a statement to the current batch without executing it immediately.

statement.executeBatch() – sends the whole batch to MySQL for execution.

statement.clearBatch() – clears the batch cache.

After enabling batch processing, write time dropped from 300 s to 9 s, with memory usage remaining stable.

Enable Multi‑Threaded Writing

Using the Disruptor library, data is parsed from XML and published to a ring buffer. Multiple consumer threads retrieve events, perform batch inserts, and commit transactions, reducing total time to about 4.5 seconds.

var disruptor = new Disruptor<>(ProductEvent::new, 16384,
    DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
    new BusySpinWaitStrategy());
var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
    consumers[i] = new SaveDbHandler(i, consumers.length, latch);
}
disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
        .then(new ClearingEventHandler());
RingBuffer<ProductEvent> ringBuffer = disruptor.start();
// publish events from XML parsing loop

The Disruptor configuration also includes a rewind mechanism to retry a batch when an exception occurs.

Further Optimization Directions

Additional improvements include:

Switching XML parsing to SAX for lower memory consumption on large files.

Increasing Disruptor batchSize (e.g., to 16384) to reduce per‑record overhead.

Tuning MySQL parameters such as innodb_log_buffer_size, innodb_buffer_pool_size, and innodb_flush_log_at_trx_commit.

Ensuring the JDBC URL contains max_allowed_packet large enough for the chosen batch size.

Combining batch writes, asynchronous processing, and MySQL tuning ultimately achieved a final performance of 4 seconds with only 1 GB peak memory usage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance OptimizationBatch ProcessingmysqlDisruptorXML parsing
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.