Databases 12 min read

Speed Up XML‑to‑MySQL Imports: Reduce 300 s to 4 s with JDBC Batch & Async

This article walks through optimizing a Java‑based XML‑to‑MySQL import, showing how to cut processing time from 300 seconds to just 4 seconds by enabling JDBC batch writes, using rewriteBatchedStatements, applying async writes with Disruptor, and tuning MySQL settings.

macrozheng
macrozheng
macrozheng
Speed Up XML‑to‑MySQL Imports: Reduce 300 s to 4 s with JDBC Batch & Async

The author faced a common data‑import scenario: loading over 60,000 records from an XML file into MySQL, which originally took 300 seconds.

Execution Environment

Java code runs on a laptop (8‑core i9, 16 GB RAM) while MySQL runs in a VirtualBox VM (4 CPU, 4 GB RAM). The JDK version is 21 and MySQL version is 8.0.

Baseline Implementation Performance

A straightforward procedural import reads the XML, extracts a list of

Product

objects, and inserts each row individually.

<code>void importData() {
    Document doc = parse XML file;
    List&lt;Product&gt; products = extract data from doc;
    for (Product p : products) {
        insert into MySQL;
    }
}
</code>

This baseline takes about 300 seconds and consumes around 656 MB of memory.

Where to Start Optimizing?

The dominant cost is MySQL write latency (≈298 seconds). Two main strategies are suggested: write aggregation (batching) and asynchronous writes.

Enable MySQL Batch Processing

JDBC provides a Batch API that can group many INSERT statements into a single request. The key steps are

addBatch()

,

executeBatch()

, and

clearBatch()

. Crucially, the connection must be created with

rewriteBatchedStatements=true

so the driver actually merges the statements.

<code>Connection connection = ...;
PreparedStatement statement = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
    statement.setString(...);
    statement.addBatch();
}
statement.executeBatch();
statement.clearBatch();
connection.commit();
</code>

After enabling batch mode, the import time drops to 9 seconds, and with a tuned batch size the overall result is 12 seconds while memory stays around 673 MB.

Why Enable rewriteBatchedStatements ?

Without this flag the driver still sends each INSERT individually, negating the benefit of the Batch API. Setting the flag merges multiple INSERTs into a single multi‑value statement, dramatically improving throughput.

Handle Duplicate Imports

When re‑importing data, you can either truncate the target table, or use upsert logic such as

INSERT ... ON DUPLICATE KEY UPDATE

or

REPLACE INTO

to make the operation idempotent.

<code>INSERT INTO t1(a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE a=VALUES(a), b=VALUES(b), c=VALUES(c);
</code>

Enable Multi‑Threaded Async Writes

Beyond batch processing, the author uses the LMAX Disruptor library to decouple XML parsing from database writes. Parsed

Product

objects are published to a ring buffer, and four consumer threads consume the events and perform batched inserts.

<code>var disruptor = new Disruptor<>(ProductEvent::new, 16384,
    DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
    new BusySpinWaitStrategy());
// configure consumers and start
</code>

This asynchronous approach reduces total processing time to about 4.5 seconds (≈60 % faster than batch‑only) while increasing peak memory usage due to multiple threads.

Rewindable Event Handling

If a batch fails, the consumer can throw a

RewindableException

to have Disruptor replay the same batch after rolling back the transaction.

Further Optimization Directions

XML Parsing

For very large XML files, switch from DOM parsing to SAX (event‑driven) parsing to lower memory pressure.

Disruptor Memory Usage

Increasing the batch size (e.g., to 16384) can further reduce latency, but the ring buffer size grows proportionally, so memory consumption must be monitored.

MySQL Write Performance

Additional tuning includes enlarging

log_buffer

, increasing the InnoDB buffer pool, and adjusting

innodb_flush_log_at_trx_commit

to control log flushing frequency.

Source Code

https://github.com/ginnsx/xml-to-mysql

JavaPerformance Optimizationbatch processingMySQLJDBCAsync
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.