Databases 6 min read

Loading 2 Billion Rows into MySQL Fast with TokuDB – 570k Rows/s

This article details a real‑world test of loading over 200 million rows into MySQL using XeLabs TokuDB, showing configuration tweaks, bulk‑loader commands, performance metrics (≈570 k rows per second), file‑size reductions, and comparisons with InnoDB under a 8‑core, 8 GB RAM, 500 GB SSD environment.

Programmer DD
Programmer DD
Programmer DD
Loading 2 Billion Rows into MySQL Fast with TokuDB – 570k Rows/s

1. Requirement

A friend needed to ingest more than 2 billion rows from a big‑data platform into MySQL for next‑day business reporting.

2. Implementation Analysis

In MySQL, a single InnoDB table can insert 100‑150 k rows per second when the dataset fits in memory, but the data here exceeds memory, so XeLabs TokuDB was tested.

3. XeLabs TokuDB Overview

Project: https://github.com/XeLabs/tokudb

Built‑in jemalloc memory allocator

Additional TokuDB performance metrics

Supports Xtrabackup backup

Integrates ZSTD compression

Supports TokuDB binlog_group_commit

4. Test Table

TokuDB core configuration:

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table definition:

CREATE TABLE `user_summary` (
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data loading with LOAD DATA:

LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);

Result: 200 000 000 rows inserted in 5 min 48.30 s (≈574 k rows/s).

File sizes:

-rw-r--r-- 1 root root 8.5G 134-136.txt
-rw-r----- 1 mysql mysql 3.5G user_summary_main_229_1_1d_B_0.tokudb

Compression reduced storage to about half (3.5 GB vs 8.5 GB).

5. Conclusions

On an 8‑core, 8 GB RAM, 500 GB high‑performance cloud disk, TokuDB consistently achieved ~570 k rows per second, roughly 3‑4× faster than InnoDB for the same workload.

When using an auto‑increment primary key, TokuDB’s bulk loader cannot be used, causing a dramatic slowdown (≈23 min for the same 200 M rows). Removing the auto‑increment or using a unique index improves speed.

6. Test Environment

CentOS 7 with a custom‑built XeLabs TokuDB version.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlDatabase OptimizationLarge Databulk loadTokuDB
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.