How to Import 1 Billion Records into MySQL at Lightning Speed
This article analyzes the constraints of loading a billion 1‑KB log records from HDFS/S3 into MySQL, evaluates single‑table limits, proposes batch inserts, sharding, file‑reading techniques, task coordination with Redis, Redisson, and Zookeeper, and offers practical performance‑tuning recommendations.
Problem Constraints
The interview scenario assumes 1 billion records, each about 1 KB, stored as unstructured access logs in HDFS or S3. The data is already split into roughly 100 ordered files, and the target database is MySQL.
Can a Single MySQL Table Hold 1 Billion Rows?
MySQL stores the primary key in a clustered B+‑tree. A leaf page is 16 KB; with 1 KB rows a leaf holds roughly 16 rows. A non‑leaf page is also 16 KB and stores a 8‑byte BIGINT primary key plus a 6‑byte pointer (≈14 bytes per entry), allowing about 1 170 child pointers per non‑leaf node.
Depth 2 (root → leaf) → 1 170 × 16 ≈ 18 720 rows.
Depth 3 → 1 170 × 1 170 × 16 ≈ 21.9 million rows (≈20 M, the practical safe limit).
Depth 4 → 1 170 ³ × 16 ≈ 25.6 billion rows, but the extra level adds significant index‑depth overhead and degrades performance.
Therefore the design splits the billion‑row dataset into 100 tables (≈10 M rows each) to stay within the safe 20 M‑row range.
Efficient Database Writing
Use batch inserts (e.g., 100 rows per batch) to amortise per‑row overhead.
Wrap each batch in an InnoDB transaction so the batch is atomic; implement retry logic for failed batches.
Defer creation of non‑primary indexes until after the bulk load; this avoids costly index‑tree rebalancing during inserts.
Insert rows in primary‑key order (e.g., file‑suffix + line‑number) to maximise sequential writes.
Avoid concurrent writes to the same table because they break ordering and cause index contention.
Storage Engine Choice
MyISAM provides higher raw insert speed but lacks transactional guarantees. InnoDB with innodb_flush_log_at_trx_commit set to 0 or 2 reduces disk‑flush latency and approaches MyISAM performance while preserving ACID semantics.
Sharding and Concurrency Limits
A single MySQL instance typically caps at ~5 K TPS. SSDs can sustain higher throughput than HDDs, but HDDs suffer from a single read/write head, making concurrent writes to multiple tables inefficient. The architecture therefore supports configurable numbers of databases and tables, allowing the system to be tuned for HDD or SSD environments.
File Reading Strategies
Reading a 10 GB file in one go is impractical. Benchmarks on macOS for a 3.4 GB file show: Files.readAllBytes – out‑of‑memory (OOM). FileReader + BufferedReader – ~11 s. File + BufferedReader – ~10 s. Scanner – ~57 s. Java NIO FileChannel (buffered) – ~3 s, but it reads fixed‑size buffers and does not preserve line boundaries, making line‑wise processing cumbersome.
Because BufferedReader naturally returns complete lines and its performance (~10 s for a 10 GB file) is acceptable, it is the preferred method for this workload.
File file = new File("/xxx.zip");
FileInputStream fis = null;
long start = System.currentTimeMillis();
try {
fis = new FileInputStream(file);
FileChannel channel = fis.getChannel();
int capacity = 1 * 1024 * 1024; // 1 MB buffer
ByteBuffer buffer = ByteBuffer.allocate(capacity);
int total = 0;
while (channel.read(buffer) != -1) {
buffer.clear();
byte[] bytes = buffer.array();
total += bytes.length;
}
System.out.println("file size:" + total);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
// close resources
}
System.out.println("Time:" + (System.currentTimeMillis() - start));Coordinating Read and Write Tasks
Separating readers and writers via Kafka introduces ordering problems: partitioning by task ID cannot guarantee ordered writes when multiple tables share a partition. The final design merges reading and writing – each task reads a batch of lines and immediately inserts them, eliminating the need for an external messaging layer.
Ensuring Task Reliability and Idempotency
Define a composite primary key {fileSuffix}_{lineNumber} (or
{taskId}{fileIndex}{rowNumber}</code) so that repeated inserts are ignored.</li><li>Record task progress in Redis using <code>INCRBY task_offset_{taskId} 100after each successful batch.
If a batch fails, retry it; after N retries fall back to single‑row inserts and still update Redis.
On process restart, read the stored offset from Redis and resume from that position, guaranteeing no duplicate writes.
Controlling Concurrency with Distributed Locks
Limit the number of concurrent write tasks per database using a Redisson semaphore (one permit per database). If a task crashes without releasing the permit, a timeout can be applied, but Redisson does not provide lease renewal. To handle this, a primary‑node election via Zookeeper + Curator assigns tasks and manages distributed locks, ensuring only the elected node can acquire the semaphore.
RedissonClient redissonClient = Redisson.create(config);
RSemaphore semaphore = redissonClient.getSemaphore("db_semaphore");
semaphore.trySetPermits(1); // one concurrent writer per DB
if (semaphore.tryAcquire()) {
// perform batch insert
// release when done
semaphore.release();
}Task Assignment Model
A task table stores the following fields: bizId – business line identifier (default 1). databaseIndex – suffix of the assigned database. tableIndex – suffix of the assigned table. parentTaskId – overall job identifier. offset – current file read offset (used for resume). status – Pending , Processing , or Completed .
Each of the 100 tasks processes one 10 GB file segment. Worker nodes poll the table, acquire the semaphore for the target database, set status to “Processing”, execute the read‑insert loop, then release the semaphore and mark the task as “Completed”.
Final Recommendations
Clarify data size, format, ordering, and target DB before designing the solution.
Split the billion‑row dataset into multiple databases/tables (e.g., 100 tables ≈10 M rows each) to stay within B+‑tree depth limits.
Analyse single‑instance write throughput; if it caps around 5 K TPS, introduce sharding.
Configure the number of concurrent writers per database; adjust dynamically for HDD vs. SSD environments.
Compare InnoDB (with innodb_flush_log_at_trx_commit=0/2) and MyISAM batch‑insert performance in the target environment.
Iteratively benchmark and tune the optimal batch size (e.g., 50, 100, 500 rows per batch).
Merge read and write tasks to avoid complex ordering guarantees required by a separate messaging layer.
Use Redis to track per‑task offsets and ensure idempotent retries.
Employ Zookeeper + Curator for primary‑node election and Redisson distributed locks to coordinate task execution safely.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
