Databases 6 min read

How to Load 2 Billion Rows into MySQL Fast with TokuDB – Real‑World Test

This article details a real‑world performance test of loading over two billion rows from a big‑data platform into MySQL using XeLabs TokuDB, covering requirements, TokuDB features, table schema, bulk‑load commands, results, and practical recommendations for high‑throughput data ingestion.

21CTO
21CTO
21CTO
How to Load 2 Billion Rows into MySQL Fast with TokuDB – Real‑World Test

Requirement

A friend needed to load more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.

Implementation analysis

InnoDB can insert 100‑150 k rows per second when the data fits into memory, but the dataset exceeds memory, so XeLabs TokuDB was evaluated.

TokuDB introduction

Project address: https://github.com/XeLabs/tokudb . Optimizations over the official TokuDB include built‑in jemalloc memory allocation, additional performance metrics, Xtrabackup backup support, ZSTD compression, and support for the binlog_group_commit feature.

Test table

Core configuration:

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table definition:

CREATE TABLE `user_summary` (
  `user_id` bigint(20) unsigned NOT NULL COMMENT 'user id / phone',
  `weight` varchar(5) DEFAULT NULL COMMENT 'weight (KG)',
  `level` varchar(20) DEFAULT NULL COMMENT 'level',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT 'beat rate',
  `level_num` int(10) DEFAULT NULL COMMENT 'same‑tonnage count',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data loading

Using LOAD DATA INFILE the 200 million rows were inserted in 5 min 48.30 s, achieving roughly 574 k rows per second.

root@localhost> LOAD DATA INFILE '/u01/work/134-136.txt'
INTO TABLE user_summary(user_id,weight,level,beat_rate,level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000  Deleted: 0  Skipped: 0  Warnings: 0

The source file was 8.5 GB; the resulting TokuDB file was 3.5 GB, showing about 1.5× compression.

Conclusion

On an 8‑core, 8 GB RAM, 500 GB SSD cloud instance, TokuDB consistently achieved >570 k rows/s, roughly 3‑4 times faster than InnoDB for the same workload. When using an auto‑increment primary key, the bulk loader cannot be applied, causing a significant slowdown.

Recommendations: if possible, remove the auto‑increment attribute and use a unique index to let TokuDB run faster; be aware that compression may limit maximum throughput.

Test environment

CentOS 7 with XeLabs‑compiled TokuDB (download link provided in the original article).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancemysqlDatabase OptimizationLarge Databulk loadTokuDB
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.