How to Load 2 Billion Rows into MySQL Fast with TokuDB – Real‑World Test
This article details a real‑world performance test of loading over two billion rows from a big‑data platform into MySQL using XeLabs TokuDB, covering requirements, TokuDB features, table schema, bulk‑load commands, results, and practical recommendations for high‑throughput data ingestion.
Requirement
A friend needed to load more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.
Implementation analysis
InnoDB can insert 100‑150 k rows per second when the data fits into memory, but the dataset exceeds memory, so XeLabs TokuDB was evaluated.
TokuDB introduction
Project address: https://github.com/XeLabs/tokudb . Optimizations over the official TokuDB include built‑in jemalloc memory allocation, additional performance metrics, Xtrabackup backup support, ZSTD compression, and support for the binlog_group_commit feature.
Test table
Core configuration:
loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0Table definition:
CREATE TABLE `user_summary` (
`user_id` bigint(20) unsigned NOT NULL COMMENT 'user id / phone',
`weight` varchar(5) DEFAULT NULL COMMENT 'weight (KG)',
`level` varchar(20) DEFAULT NULL COMMENT 'level',
`beat_rate` varchar(12) DEFAULT NULL COMMENT 'beat rate',
`level_num` int(10) DEFAULT NULL COMMENT 'same‑tonnage count',
UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;Data loading
Using LOAD DATA INFILE the 200 million rows were inserted in 5 min 48.30 s, achieving roughly 574 k rows per second.
root@localhost> LOAD DATA INFILE '/u01/work/134-136.txt'
INTO TABLE user_summary(user_id,weight,level,beat_rate,level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0The source file was 8.5 GB; the resulting TokuDB file was 3.5 GB, showing about 1.5× compression.
Conclusion
On an 8‑core, 8 GB RAM, 500 GB SSD cloud instance, TokuDB consistently achieved >570 k rows/s, roughly 3‑4 times faster than InnoDB for the same workload. When using an auto‑increment primary key, the bulk loader cannot be applied, causing a significant slowdown.
Recommendations: if possible, remove the auto‑increment attribute and use a unique index to let TokuDB run faster; be aware that compression may limit maximum throughput.
Test environment
CentOS 7 with XeLabs‑compiled TokuDB (download link provided in the original article).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
