How to Load 2 Billion Rows into MySQL Fast with TokuDB: Performance Test
A friend needed to import over 2 billion records from a big‑data platform into MySQL for next‑day reporting, so we evaluated InnoDB versus XeLabs TokuDB, configured TokuDB parameters, created test tables, used LOAD DATA, measured write speed, file size, and discovered TokuDB can achieve about 570 k rows per second, roughly three to four times faster than InnoDB, especially when avoiding auto‑increment columns.
Requirement
A colleague needed to ingest more than 2 billion rows generated by a big‑data platform into MySQL for next‑day business reporting.
Re‑analysis of the solution
InnoDB can insert 100‑150 k rows per second when the data fits entirely in memory, but performance degrades when the dataset exceeds memory. To find a faster alternative, XeLabs TokuDB was evaluated.
XeLabs TokuDB overview
Project repository: https://github.com/XeLabs/tokudb
Built‑in jemalloc memory allocator
Additional TokuDB performance metrics
Supports XtraBackup for backups
Integrates ZSTD compression algorithm
Supports TokuDB binlog_group_commit feature
Test table definition
TokuDB core configuration (my.cnf):
loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0Table schema:
CREATE TABLE `user_summary` (
`user_id` BIGINT(20) UNSIGNED NOT NULL COMMENT '用户id/手机号',
`weight` VARCHAR(5) DEFAULT NULL COMMENT '和码体重(KG)',
`level` VARCHAR(20) DEFAULT NULL COMMENT '重量级',
`beat_rate` VARCHAR(12) DEFAULT NULL COMMENT '击败率',
`level_num` INT(10) DEFAULT NULL COMMENT '同吨位人数',
UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;Data load using LOAD DATA
Command:
LOAD DATA INFILE '/u01/work/134-136.txt' INTO TABLE user_summary
(user_id, weight, level, beat_rate, level_num);Result: 200,000,000 rows loaded in 5 min 48.30 s (≈ 574 k rows/s).
File sizes after load:
-rw-r--r-- 1 root root 8.5G 134-136.txt
-rw-r----- 1 mysql mysql 8.6K user_summary.frm
-rw-r----- 1 mysql mysql 3.5G user_summary_main_229_1_1d_B_0.tokudbThe TokuDB data file is about 3.5 GB, roughly half the size of the original 8.5 GB text file, demonstrating effective compression.
Test conclusions
On an 8‑core, 8 GB RAM, 500 GB high‑performance cloud disk, TokuDB consistently achieved > 570 k rows/s, completing a 2 billion‑row load in just over 58 minutes.
InnoDB required roughly 3‑4 × the time for the same workload under comparable settings.
If the table includes an auto‑increment primary key, TokuDB’s bulk loader cannot be used; loading 200 M rows then took 22 min 43 s.
For best performance, remove the auto‑increment attribute and use a unique index instead.
Additional scenario: table with auto‑increment primary key
Table definition:
CREATE TABLE `user3` (
`user_id` BIGINT(20) UNSIGNED NOT NULL COMMENT '用户id/手机号',
`weight` VARCHAR(5) DEFAULT NULL COMMENT '和码体重(KG)',
`level` VARCHAR(20) DEFAULT NULL COMMENT '重量级',
`beat_rate` VARCHAR(12) DEFAULT NULL COMMENT '击败率',
`level_num` INT(10) DEFAULT NULL COMMENT '同吨位人数',
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB;Loading the same 200 M rows into this table took 22 min 43 s, confirming that the bulk‑loader advantage is lost when MySQL must generate auto‑increment values.
Bulk loader requirements
The bulk loader works only on empty tables. If an auto‑increment column already contains values, drop the auto‑increment attribute or replace it with a unique index to retain high throughput.
Technical reference for TokuDB Bulk Loader: https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader
Test environment
CentOS 7 with a custom‑compiled XeLabs TokuDB binary (download link: https://pan.baidu.com/s/1qYRyH3I).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
