Databases 8 min read

How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Approaches

This article walks through three concrete methods for moving massive ClickHouse datasets—up to 100 billion rows—to Doris, detailing catalog integration, file export with stream load, and Spark‑based pipelines, while sharing real‑world performance results and pitfalls.

dbaplus Community
dbaplus Community
dbaplus Community
How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Approaches

Background: A ClickHouse cluster is being decommissioned, but it holds roughly 100 billion rows of data that need to be preserved. The author decides to migrate the data to Apache Doris (CK version 25.x, Doris version 2.1.2) and evaluates three feasible migration strategies.

Migration Options

Option 1: Use Doris' ClickHouse catalog and execute INSERT INTO … SELECT * FROM … directly.

Option 2: Export each ClickHouse table to files (CSV/TSV/JSON/Parquet) and load them into Doris via Broker Load or Stream Load.

Option 3: Read ClickHouse tables with Spark and write the results into Doris tables.

Based on personal experience and AI suggestions, Option 1 is simplest but most bug‑prone; Option 2 is a bit more complex yet more stable; Option 3 is the most robust but requires custom code and extra compute resources.

Validating Option 1

Steps:

Create a ClickHouse catalog in Doris:

CREATE CATALOG `jdbc_clickhouse01` PROPERTIES (
  "user" = "default",
  "type" = "jdbc",
  "password" = "***",
  "jdbc_url" = "jdbc:clickhouse://192.168.xxx.xxx:8123",
  "driver_url" = "clickhouse-jdbc-0.4.6-all.jar",
  "driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
);

Switch to the catalog and attempt INSERT INTO … SELECT *.

Result: Works for small local tables (e.g., 225 rows) but fails for larger local tables and all partitioned tables, producing errors such as IndexOutOfBoundsException. The method is therefore unreliable for big or sharded tables.

Validating Option 2

Process:

Export ClickHouse tables to TSV (tab‑separated CSV) using:

clickhouse-client --query="SELECT * FROM ck_db.ck_big_table FORMAT TSV" > /tmp/ck_big_table.tsv

Transfer the TSV files to a Doris node.

Load them with Doris Stream Load, specifying column_separator:\t:

curl --location-trusted -u doris_user:**** \
  -H "label:label01" -H "column_separator:\t" \
  -T ck_big_table.tsv \
  http://192.168.xxx.xxx:8030/api/doris_db/ck_big_table/_stream_load

The response includes metrics such as NumberTotalRows: 1406520 , LoadTimeMs: 11456 , etc.

Use TSV because commas in data would break standard CSV parsing; JSON was avoided due to the need for explicit schema definitions.

Performance: Exporting ~1.4 GB of data from ClickHouse took under 2 seconds; Doris Stream Load completed in under 20 seconds, indicating both steps are fast.

Final Thoughts

There is no single “silver‑bullet” solution for heterogeneous DB migration. In practice, a combination of strategies is required: manually create matching Doris tables based on ClickHouse schema, then use the catalog + Stream Load combo for simple tables, and fall back to Spark for complex cases.

Readers are invited to share their own experiences with ClickHouse‑to‑Doris migrations.

data migrationSQLClickHousebenchmarkSparkApache DorisStream Load
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.