Loading 2 Billion Rows into MySQL Fast with TokuDB – 570k Rows/s
This article details a real‑world test of loading over 200 million rows into MySQL using XeLabs TokuDB, showing configuration tweaks, bulk‑loader commands, performance metrics (≈570 k rows per second), file‑size reductions, and comparisons with InnoDB under a 8‑core, 8 GB RAM, 500 GB SSD environment.
1. Requirement
A friend needed to ingest more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.
2. Implementation Analysis
In MySQL, a single InnoDB table can insert 100‑150 k rows per second when the dataset fits in memory, but the data here exceeds memory, so XeLabs TokuDB was tested.
3. XeLabs TokuDB Overview
Project: https://github.com/XeLabs/tokudb
Built‑in jemalloc memory allocator
Additional TokuDB performance metrics
Supports Xtrabackup backup
Integrates ZSTD compression
Supports TokuDB binlog_group_commit
4. Test Table
TokuDB core configuration:
loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0Table definition:
CREATE TABLE `user_summary` (
`user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
`weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
`level` varchar(20) DEFAULT NULL COMMENT '重量级',
`beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
`level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;Data loading with LOAD DATA:
LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);Result: 200 000 000 rows inserted in 5 min 48.30 s (≈574 k rows/s).
File sizes:
-rw-r--r-- 1 root root 8.5G 134-136.txt
-rw-r----- 1 mysql mysql 3.5G user_summary_main_229_1_1d_B_0.tokudbCompression reduced storage to about half (3.5 GB vs 8.5 GB).
5. Conclusions
On an 8‑core, 8 GB RAM, 500 GB high‑performance cloud disk, TokuDB consistently achieved ~570 k rows per second, roughly 3‑4× faster than InnoDB for the same workload.
When using an auto‑increment primary key, TokuDB’s bulk loader cannot be used, causing a dramatic slowdown (≈23 min for the same 200 M rows). Removing the auto‑increment or using a unique index improves speed.
6. Test Environment
CentOS 7 with a custom‑built XeLabs TokuDB version.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
