How to Load 2 Billion Rows into MySQL Fast with TokuDB – 57k Rows/s Benchmark
This article describes a real‑world test of loading over 2 billion records from a big‑data platform into MySQL using XeLabs TokuDB, showing configuration details, performance results, and practical tips for achieving up to 570 k rows per second on a cloud instance.
Requirement
A friend needed to load more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.
Implementation Reanalysis
In MySQL, a single‑table insert can reach 100k‑150k rows/s when memory exceeds data size, but many projects exceed available memory. XeLabs TokuDB was tested as an alternative.
XeLabs TokuDB Overview
Project address: https://github.com/XeLabs/tokudb
Built‑in jemalloc memory allocator
Additional TokuDB performance metrics
Supports Xtrabackup backup
Integrates ZSTD compression algorithm
Supports TokuDB binlog_group_commit feature
Test Table
TokuDB core configuration:
Table schema:
Data loaded using LOAD DATA:
Calculated write speed:
File size comparison: original file 8.5 GB, TokuDB file 3.5 GB (≈40% of original). Loading 2 billion rows completed in about 58 minutes, meeting the requirement. In comparable InnoDB tests, TokuDB was 3‑4× faster.
File size difference illustration:
Test Conclusions
On a cloud environment with 8 CPU cores, 8 GB RAM, and a 500 GB high‑speed cloud disk, TokuDB consistently achieved up to 570 k rows per second.
When using an auto‑increment primary key, TokuDB’s bulk loader cannot be used, causing a slowdown to single‑row inserts. If the auto‑increment column already has values, consider removing the auto‑increment attribute and using a unique index to reduce overhead and improve speed. Compression may be less effective during bulk loading.
Reference for TokuDB Bulk Loader: https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader
Test Environment
Tests were performed on CentOS 7. The XeLabs TokuDB version was compiled from Baidu Cloud (link: https://pan.baidu.com/s/1qYRyH3I).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
