Big Data 8 min read

FastLoad: A One-Click DTS Platform for Online Data Migration

FastLoad, Didi’s one‑click DTS platform, accelerates migration of terabyte‑scale offline data into its Fusion storage by using RocksDB’s IngestFile to import SST files directly, cutting a 1 TB load from twelve hours to one, while supporting thousands of daily tasks with 99.99% stability.

Didi Tech
Didi Tech
Didi Tech
FastLoad: A One-Click DTS Platform for Online Data Migration

FastLoad is a one-click Data Transmission Service (DTS) platform developed by Didi to address the challenge of migrating large volumes of offline data to online storage systems. The platform primarily targets Fusion, Didi's self-developed distributed storage system based on RocksDB, which serves over 500 online clusters with 1600TB+ of data and handles 12 million QPS peak traffic.

The platform was created to solve several key business requirements: timely data updates (hourly or daily), rapid data migration for large datasets (often TB-scale), high stability, and multi-table isolation for different feature data types. Traditional methods of data migration through SDK-based writes were found to be too slow and unstable for production environments.

FastLoad leverages RocksDB's IngestFile interface to directly import SST files into the storage engine, bypassing the traditional write path that includes WAL logging, memory writes, and disk flushing. This approach significantly reduces data migration time - for example, 1TB of data can be imported in approximately one hour instead of the 12 hours required by traditional methods.

The system architecture consists of several modules: a console service for API access and task management, a big data scheduling module that uses Hadoop to convert Hive data into SST files, a file download module that distributes files to storage nodes based on routing tables, and a file import and database switching module that handles the actual data migration.

Since its implementation, FastLoad has served over 300 business applications, runs more than 1000 times daily, migrates 30TB+ of data per day, and provides hundreds of billions of efficient queries. The platform maintains 99.99% stability and has operated without major incidents for over two years.

Future development plans include architecture optimization (potentially migrating from Hadoop to Spark), enhanced monitoring and reporting, expansion to other products like Elasticsearch and message queues, and support for new scenarios including real-time offline data reading for HTAP (Hybrid Transactional/Analytical Processing) use cases.

data migrationPerformance OptimizationBig Datadistributed storageDTS platformRocksDBSST files
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.