Databases 8 min read

How to Load 2 Billion Rows into MySQL Fast with TokuDB: Performance Test

A friend needed to import over 2 billion records from a big‑data platform into MySQL for next‑day reporting, so we evaluated InnoDB versus XeLabs TokuDB, configured TokuDB parameters, created test tables, used LOAD DATA, measured write speed, file size, and discovered TokuDB can achieve about 570 k rows per second, roughly three to four times faster than InnoDB, especially when avoiding auto‑increment columns.

ITPUB
ITPUB
ITPUB
How to Load 2 Billion Rows into MySQL Fast with TokuDB: Performance Test

Requirement

A colleague needed to ingest more than 2 billion rows generated by a big‑data platform into MySQL for next‑day business reporting.

Re‑analysis of the solution

InnoDB can insert 100‑150 k rows per second when the data fits entirely in memory, but performance degrades when the dataset exceeds memory. To find a faster alternative, XeLabs TokuDB was evaluated.

XeLabs TokuDB overview

Project repository: https://github.com/XeLabs/tokudb

Built‑in jemalloc memory allocator

Additional TokuDB performance metrics

Supports XtraBackup for backups

Integrates ZSTD compression algorithm

Supports TokuDB binlog_group_commit feature

Test table definition

TokuDB core configuration (my.cnf):

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table schema:

CREATE TABLE `user_summary` (
  `user_id`   BIGINT(20) UNSIGNED NOT NULL COMMENT '用户id/手机号',
  `weight`    VARCHAR(5)  DEFAULT NULL COMMENT '和码体重(KG)',
  `level`     VARCHAR(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` VARCHAR(12) DEFAULT NULL COMMENT '击败率',
  `level_num` INT(10)     DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data load using LOAD DATA

Command:

LOAD DATA INFILE '/u01/work/134-136.txt' INTO TABLE user_summary
(user_id, weight, level, beat_rate, level_num);

Result: 200,000,000 rows loaded in 5 min 48.30 s (≈ 574 k rows/s).

File sizes after load:

-rw-r--r-- 1 root   root   8.5G 134-136.txt
-rw-r----- 1 mysql  mysql  8.6K user_summary.frm
-rw-r----- 1 mysql  mysql  3.5G user_summary_main_229_1_1d_B_0.tokudb

The TokuDB data file is about 3.5 GB, roughly half the size of the original 8.5 GB text file, demonstrating effective compression.

Test conclusions

On an 8‑core, 8 GB RAM, 500 GB high‑performance cloud disk, TokuDB consistently achieved > 570 k rows/s, completing a 2 billion‑row load in just over 58 minutes.

InnoDB required roughly 3‑4 × the time for the same workload under comparable settings.

If the table includes an auto‑increment primary key, TokuDB’s bulk loader cannot be used; loading 200 M rows then took 22 min 43 s.

For best performance, remove the auto‑increment attribute and use a unique index instead.

Additional scenario: table with auto‑increment primary key

Table definition:

CREATE TABLE `user3` (
  `user_id`   BIGINT(20) UNSIGNED NOT NULL COMMENT '用户id/手机号',
  `weight`    VARCHAR(5)  DEFAULT NULL COMMENT '和码体重(KG)',
  `level`     VARCHAR(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` VARCHAR(12) DEFAULT NULL COMMENT '击败率',
  `level_num` INT(10)    DEFAULT NULL COMMENT '同吨位人数',
  `id`        BIGINT(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB;

Loading the same 200 M rows into this table took 22 min 43 s, confirming that the bulk‑loader advantage is lost when MySQL must generate auto‑increment values.

Bulk loader requirements

The bulk loader works only on empty tables. If an auto‑increment column already contains values, drop the auto‑increment attribute or replace it with a unique index to retain high throughput.

Technical reference for TokuDB Bulk Loader: https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader

Test environment

CentOS 7 with a custom‑compiled XeLabs TokuDB binary (download link: https://pan.baidu.com/s/1qYRyH3I).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancemysqlbulk loadTokuDBLarge Data Import
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.