Databases 7 min read

Loading 2 Billion Rows into MySQL Fast with XeLabs TokuDB

This article details a real‑world test of loading over 2 billion rows into MySQL using XeLabs TokuDB, compares InnoDB and TokuDB performance, shows configuration, table schema, bulk‑loader commands, and provides practical conclusions for high‑throughput data ingestion.

21CTO
21CTO
21CTO
Loading 2 Billion Rows into MySQL Fast with XeLabs TokuDB

1. Requirement

A friend received a request to ingest more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.

2. Implementation Re‑analysis

In MySQL operations, inserting 200 million rows per table can reach 100‑150 k rows/second when the data fits in memory, but most projects exceed memory. We therefore tested XeLabs TokuDB.

3. XeLabs TokuDB Overview

Project address: https://github.com/XeLabs/tokudb

Optimisations compared with the official TokuDB:

Built‑in jemalloc memory allocator

Additional TokuDB performance metrics

Support for Xtrabackup backups

Integration of ZSTD compression algorithm

Support for TokuDB binlog_group_commit feature

4. Test Table

TokuDB core configuration:

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table schema:

CREATE TABLE `user_summary` (
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data load using LOAD DATA:

root@localhost [zst]>LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0

Write speed calculation:

root@localhost [zst]>select 200000000/(5*60+48.30);
+------------------------+
| 200000000/(5*60+48.30) |
+------------------------+
| 574217.6285            |
+------------------------+
1 row in set (0.00 sec)

File sizes:

-rw-r--r-- 1 root root 8.5G 11月 25 20:05 134-136.txt
-rw-r----- 1 mysql mysql 8.6K 11月 25 20:44 user_summary.frm
-rw-r----- 1 mysql mysql 3.5G 11月 25 20:51 user_summary_main_229_1_1d_B_0.tokudb

The raw file is 8.5 GB, while the TokuDB storage occupies 3.5 GB, a compression ratio of a little over 2:1. The entire 2 billion‑row load completed in just over 58 minutes, meeting the requirement. On machines with fast SSD storage, InnoDB with an auto‑increment column can finish in a little over 3 minutes, but InnoDB generally takes 3‑4× longer than TokuDB for the same data.

Conclusion: In an 8‑core, 8 GB RAM, 500 GB high‑speed cloud‑disk environment, TokuDB consistently achieves about 570 k rows/second.

Additional Scenarios

When using a table with an auto‑increment primary key, the bulk loader cannot be used, resulting in much slower performance (22 min 43 sec for the same 200 million rows). The recommendation is to remove the auto‑increment attribute and use a unique index instead, allowing TokuDB to leverage its bulk‑loader for faster ingestion.

Reference for TokuDB Bulk Loader: https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader

5. Test Environment

Tests were run on CentOS 7 with a custom‑compiled XeLabs TokuDB version (download link: https://pan.baidu.com/s/1qYRyH3I).

Source: http://t.cn/E2TbCg5
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlDatabase OptimizationLarge Databulk loadTokuDB
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.