Loading 2 Billion Rows into MySQL Fast with XeLabs TokuDB
This article details a real‑world test of loading over 2 billion rows into MySQL using XeLabs TokuDB, compares InnoDB and TokuDB performance, shows configuration, table schema, bulk‑loader commands, and provides practical conclusions for high‑throughput data ingestion.
1. Requirement
A friend received a request to ingest more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.
2. Implementation Re‑analysis
In MySQL operations, inserting 200 million rows per table can reach 100‑150 k rows/second when the data fits in memory, but most projects exceed memory. We therefore tested XeLabs TokuDB.
3. XeLabs TokuDB Overview
Project address: https://github.com/XeLabs/tokudb
Optimisations compared with the official TokuDB:
Built‑in jemalloc memory allocator
Additional TokuDB performance metrics
Support for Xtrabackup backups
Integration of ZSTD compression algorithm
Support for TokuDB binlog_group_commit feature
4. Test Table
TokuDB core configuration:
loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0Table schema:
CREATE TABLE `user_summary` (
`user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
`weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
`level` varchar(20) DEFAULT NULL COMMENT '重量级',
`beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
`level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;Data load using LOAD DATA:
root@localhost [zst]>LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0Write speed calculation:
root@localhost [zst]>select 200000000/(5*60+48.30);
+------------------------+
| 200000000/(5*60+48.30) |
+------------------------+
| 574217.6285 |
+------------------------+
1 row in set (0.00 sec)File sizes:
-rw-r--r-- 1 root root 8.5G 11月 25 20:05 134-136.txt
-rw-r----- 1 mysql mysql 8.6K 11月 25 20:44 user_summary.frm
-rw-r----- 1 mysql mysql 3.5G 11月 25 20:51 user_summary_main_229_1_1d_B_0.tokudbThe raw file is 8.5 GB, while the TokuDB storage occupies 3.5 GB, a compression ratio of a little over 2:1. The entire 2 billion‑row load completed in just over 58 minutes, meeting the requirement. On machines with fast SSD storage, InnoDB with an auto‑increment column can finish in a little over 3 minutes, but InnoDB generally takes 3‑4× longer than TokuDB for the same data.
Conclusion: In an 8‑core, 8 GB RAM, 500 GB high‑speed cloud‑disk environment, TokuDB consistently achieves about 570 k rows/second.
Additional Scenarios
When using a table with an auto‑increment primary key, the bulk loader cannot be used, resulting in much slower performance (22 min 43 sec for the same 200 million rows). The recommendation is to remove the auto‑increment attribute and use a unique index instead, allowing TokuDB to leverage its bulk‑loader for faster ingestion.
Reference for TokuDB Bulk Loader: https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader
5. Test Environment
Tests were run on CentOS 7 with a custom‑compiled XeLabs TokuDB version (download link: https://pan.baidu.com/s/1qYRyH3I).
Source: http://t.cn/E2TbCg5
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
