High‑Speed Loading of 2 Billion Rows into MySQL Using TokuDB
This article describes a real‑world requirement to import over 2 billion records into MySQL, analyzes the challenges, introduces XeLabs TokuDB with its optimizations, details the test schema and configuration, demonstrates bulk loading commands, presents performance metrics showing up to 570 k rows per second, and concludes with practical recommendations and environment details.
A friend needed to load more than 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting.
Because InnoDB can only sustain about 100‑150 k rows per second when the dataset fits in memory, the author evaluated XeLabs TokuDB as an alternative.
XeLabs TokuDB Overview
Project: https://github.com/XeLabs/tokudb
Built‑in jemalloc memory allocator
Additional TokuDB performance metrics
Supports XtraBackup backups
Integrates ZSTD compression
Supports TokuDB binlog group‑commit
Test Table Configuration
loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0Table definition (TokuDB engine):
CREATE TABLE `user_summary`(
`user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
`weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
`level` varchar(20) DEFAULT NULL COMMENT '重量级',
`beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
`level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
UNIQUE KEY `u_user_id`(`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;Data loading using LOAD DATA INFILE:
root@localhost [zst]> LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id,weight,level,beat_rate,level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0Calculated write speed:
SELECT 200000000/(5*60+48.30); -- result: 574217.6285 rows/secFile size: source file 8.5 GB, TokuDB storage 3.5 GB (≈ 40 % compression). The test completed in just over 58 minutes, meeting the requirement. In a comparable InnoDB test, the same data required roughly 3‑4 times longer.
Additional scenarios showed that using an auto‑increment primary key disables TokuDB’s bulk loader, causing a significant slowdown.
Recommendations: when using TokuDB bulk loader, avoid auto‑increment columns or replace them with a unique key to achieve the best performance; note that compression may be less effective in bulk‑load mode.
Test Environment
CentOS 7 with a custom‑compiled XeLabs TokuDB version, 8 CPU cores, 8 GB RAM, 500 GB high‑performance cloud disk.
Further resources: TokuDB Bulk Loader documentation – https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
