Designing High‑Performance Import of 1 Billion Records into MySQL
This article analyzes how to import one billion 1 KB log records stored in HDFS or S3 into MySQL efficiently, covering constraints, single‑table limits, batch insertion, storage‑engine choices, sharding, file‑reading techniques, task reliability, and concurrency control to achieve optimal throughput.
The author revisits a past interview question about importing 1 billion rows (each ~1 KB) from distributed storage (HDFS/S3) into MySQL, outlining the need to clarify data format, size, ordering, deduplication, and the target database.
Can a single MySQL table hold 1 billion rows?
No. Practical limits suggest keeping a single table under 20 million rows because the B+‑tree index depth grows, degrading insert performance.
MySQL stores the primary key in a clustered B+‑tree; leaf pages are 16 KB, holding 16 rows of 1 KB each. Non‑leaf pages also 16 KB, storing a BigInt primary key (8 bytes) and a 6‑byte pointer, allowing roughly 1 170 children per node. Beyond ~20 M rows the index depth reaches four levels, which is slower.
Therefore the design splits the data into about 100 tables (≈10 M rows each).
How to write efficiently
Single‑row inserts are slow; batch inserts (e.g., 100 rows per batch) are recommended. Use InnoDB transactions to guarantee atomicity of each batch.
Support retry logic: on failure retry the batch, and if it still fails, fall back to inserting rows individually and log the failures.
Write rows in primary‑key order and avoid non‑primary indexes during bulk load; create them after the load if needed.
Should concurrent writes target the same table?
No. Concurrent writes to the same table break ordering and cause index contention. Increase batch size to raise concurrency without writing the same table simultaneously.
Choosing the MySQL storage engine
MyISAM offers higher raw insert speed but lacks transaction support, risking duplicate data on retries. InnoDB with innodb_flush_log_at_trx_commit set to 0 or 2 can approach MyISAM performance while preserving transactions.
innodb_flush_log_at_trx_commit: 0 or 2 → flush every second, may lose up to 1 s of data on crash.Testing shows InnoDB’s batch performance is comparable to MyISAM when the immediate‑flush option is disabled.
Do we need sharding?
Single‑instance MySQL caps at ~5 K TPS. With SSDs the write bandwidth is higher than HDDs, but HDDs cannot handle many concurrent writes due to a single read/write head. Therefore, sharding (multiple databases/tables) is advisable, especially on HDDs.
Design must allow configurable numbers of databases and concurrent write tables, adapting to SSD/HDD characteristics.
Fast file reading
Reading a 10 GB file cannot be done in one go. Tested methods on macOS include:
Files.readAllBytes (full load)
FileReader + BufferedReader (line‑by‑line)
File + BufferedReader
Scanner (line‑by‑line)
Java NIO FileChannel with buffers
Benchmarks show Java NIO FileChannel is fastest, but it reads raw bytes and requires extra logic to split lines. BufferedReader provides line‑wise reading with acceptable performance (≈30 s for 10 GB) and is sufficient because the bottleneck lies in DB writes.
File file = new File("/xxx.zip");
FileInputStream fileInputStream = null;
long now = System.currentTimeMillis();
try {
fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
int capacity = 1 * 1024 * 1024; // 1M
ByteBuffer byteBuffer = ByteBuffer.allocate(capacity);
StringBuffer buffer = new StringBuffer();
int size = 0;
while (fileChannel.read(byteBuffer) != -1) {
byteBuffer.clear();
byte[] bytes = byteBuffer.array();
size += bytes.length;
}
System.out.println("file size:" + size);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
// TODO close resources.
}
System.out.println("Time:" + (System.currentTimeMillis() - now));Coordinating read and write tasks
Attempted Kafka‑based decoupling proved problematic for preserving order. The final approach merges reading and writing: each task reads a chunk of a file and immediately writes it to the assigned table.
Ensuring task reliability
Use a composite primary key (taskId + fileIndex + lineNumber) to guarantee idempotent inserts. Record task progress in Redis with INCRBY so that on restart the task can resume from the last offset.
INCRBY KEY_NAME INCR_AMOUNTIf a batch fails, retry; after repeated failures, insert rows individually and update Redis accordingly.
Controlling concurrency
Redisson semaphores can limit concurrent writes per database, but they lack lease‑extension. An alternative is a leader‑election via Zookeeper+Curator: the leader assigns tasks, acquires a distributed lock per task, and releases it after completion.
RedissonClient redissonClient = Redisson.create(config);
RSemaphore rSemaphore = redissonClient.getSemaphore("semaphore");
// set permits to 1
rSemaphore.trySetPermits(1);
rSemaphore.tryAcquire(); // non‑blocking acquireKey take‑aways
Clarify constraints before designing the solution.
Large data volumes require sharding (databases/tables) to stay within single‑table limits.
Limit concurrent writes per database and make the limit configurable.
Choose the appropriate MySQL storage engine (InnoDB with tuned flush settings or MyISAM) after performance testing.
Determine the optimal batch size through iterative testing.
Combine read‑and‑write tasks to avoid ordering issues with Kafka.
Persist task progress in Redis to achieve fault‑tolerant resumable imports.
Use a leader node with Zookeeper for safe task distribution and distributed locks.
References
[1] File reading performance comparison: https://zhuanlan.zhihu.com/p/142029812
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.