Databases 7 min read

Loading 2 Billion Rows into MySQL Fast with XeLabs TokuDB

This article details a real‑world test of loading over 2 billion rows into MySQL using XeLabs TokuDB, compares InnoDB and TokuDB performance, shows configuration, table schema, bulk‑loader commands, and provides practical conclusions for high‑throughput data ingestion.

21CTO

Feb 14, 2019

Loading 2 Billion Rows into MySQL Fast with XeLabs TokuDB

1. Requirement

A friend received a request to ingest more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.

2. Implementation Re‑analysis

In MySQL operations, inserting 200 million rows per table can reach 100‑150 k rows/second when the data fits in memory, but most projects exceed memory. We therefore tested XeLabs TokuDB.

3. XeLabs TokuDB Overview

Project address: https://github.com/XeLabs/tokudb

Optimisations compared with the official TokuDB:

Built‑in jemalloc memory allocator

Additional TokuDB performance metrics

Support for Xtrabackup backups

Integration of ZSTD compression algorithm

Support for TokuDB binlog_group_commit feature

4. Test Table

TokuDB core configuration:

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table schema:

CREATE TABLE `user_summary` (
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data load using LOAD DATA:

root@localhost [zst]>LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0

Write speed calculation:

root@localhost [zst]>select 200000000/(5*60+48.30);
+------------------------+
| 200000000/(5*60+48.30) |
+------------------------+
| 574217.6285            |
+------------------------+
1 row in set (0.00 sec)

File sizes:

-rw-r--r-- 1 root root 8.5G 11月 25 20:05 134-136.txt
-rw-r----- 1 mysql mysql 8.6K 11月 25 20:44 user_summary.frm
-rw-r----- 1 mysql mysql 3.5G 11月 25 20:51 user_summary_main_229_1_1d_B_0.tokudb

The raw file is 8.5 GB, while the TokuDB storage occupies 3.5 GB, a compression ratio of a little over 2:1. The entire 2 billion‑row load completed in just over 58 minutes, meeting the requirement. On machines with fast SSD storage, InnoDB with an auto‑increment column can finish in a little over 3 minutes, but InnoDB generally takes 3‑4× longer than TokuDB for the same data.

Conclusion: In an 8‑core, 8 GB RAM, 500 GB high‑speed cloud‑disk environment, TokuDB consistently achieves about 570 k rows/second.

Additional Scenarios

When using a table with an auto‑increment primary key, the bulk loader cannot be used, resulting in much slower performance (22 min 43 sec for the same 200 million rows). The recommendation is to remove the auto‑increment attribute and use a unique index instead, allowing TokuDB to leverage its bulk‑loader for faster ingestion.

Reference for TokuDB Bulk Loader: https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader

5. Test Environment

Tests were run on CentOS 7 with a custom‑compiled XeLabs TokuDB version (download link: https://pan.baidu.com/s/1qYRyH3I).

Source: http://t.cn/E2TbCg5

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql database optimization Large Data bulk load TokuDB

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.