How I Cut XML‑to‑MySQL Import Time from 300 s to 4 s with Batch, RewriteBatchedStatements, and Multithreading
This article walks through optimizing a 60,000‑row XML‑to‑MySQL import by profiling the environment, measuring baseline performance, and applying JDBC batch rewriting, write aggregation, and asynchronous writes with LMAX Disruptor, ultimately reducing execution time from 300 seconds to about four seconds while keeping memory usage reasonable.
Environment
Java code runs on a laptop (8‑core i9, 16 GB RAM) while MySQL runs in a VirtualBox VM (4 CPU, 4 GB RAM). JDK 21 and MySQL 8.0 are used. Reading a record from the XML takes 0.08 s, inserting a record into MySQL takes 0.5 s.
Baseline Implementation
The initial procedural implementation parses the XML, builds a list of Product objects, and inserts each record individually. The whole process takes about 300 seconds and peaks at 656 MB of memory.
void importData() {
Document doc = parse XML file;
List<Product> products = extract data from doc;
for (Product p : products) {
// insert into MySQL
}
}Optimization Direction
The biggest gain comes from reducing MySQL write cost. Two main techniques are write aggregation (batch) and asynchronous writes.
Enable MySQL Batch Processing
JDBC batch API can group many inserts into a single round‑trip. The driver must be configured with rewriteBatchedStatements=true for true batch rewriting.
Connection conn = ...;
PreparedStatement stmt = conn.prepareStatement(sql);
conn.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
stmt.setString(...);
stmt.addBatch();
}
stmt.executeBatch();
conn.commit();Key methods: addBatch(), executeBatch(), clearBatch(). After enabling, write time drops from 300 s to 9 s, and memory remains stable.
Asynchronous Writing with Disruptor
Using the LMAX Disruptor, XML parsing produces events that are placed into a ring buffer. Multiple consumer threads consume events, batch‑insert into MySQL, and commit.
var disruptor = new Disruptor<>(ProductEvent::new, 16384,
DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
new BusySpinWaitStrategy());
var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
consumers[i] = new SaveDbHandler(i, consumers.length, latch);
}
disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
.then(new ClearingEventHandler());
RingBuffer<ProductEvent> ring = disruptor.start();
for (Iterator<Element> it = document.getRootElement().elementIterator(); it.hasNext(); ) {
Element e = it.next();
if (!StringUtils.hasText(e.elementTextTrim("id"))) continue;
Product p = ObjectMapper.buildProduct(e);
ring.publishEvent((event, seq, buf) -> event.setProduct(p));
}Consumers check sequence % numberOfConsumers == ordinal to avoid duplicate processing. This reduces total time to about 4.5 s (≈60 % faster than batch alone) with higher memory usage (~1 GB).
Further Optimizations
Increase Disruptor batchSize (e.g., 16384) to lower latency, but watch ring‑buffer size.
Use INSERT … ON DUPLICATE KEY UPDATE or REPLACE INTO for idempotent imports.
Adjust MySQL parameters: innodb_log_buffer_size, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit.
Consider SAX parsing for very large XML to reduce memory pressure.
Result
After applying batch processing, rewriteBatchedStatements, and multithreaded asynchronous writes, the import time fell from 300 s to roughly 4 s, with peak memory around 1 GB.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
