Databases 9 min read

How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested

Facing a ClickHouse cluster shutdown, the author explores three migration methods—using Doris’s ClickHouse catalog, exporting to files with Broker/Stream Load, and Spark—to transfer ~10 billion rows to Doris, evaluating each for simplicity, bugs, and performance, and sharing detailed steps, code snippets, and benchmark results.

ITPUB

Jan 15, 2026

How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested

When a ClickHouse cluster needed to be decommissioned, the author needed to back up roughly 10 billion rows of data that might be needed later. The goal was to migrate this data to Doris, a modern analytical database, and compare three feasible migration approaches.

Migration Options

Option 1: Use Doris’s ClickHouse catalog and execute INSERT INTO … SELECT * directly.

Option 2: Export each ClickHouse table to a file (CSV/TSV/Parquet) and load the files into Doris via Broker Load or Stream Load.

Option 3: Read ClickHouse tables with Spark and write the results into Doris.

Based on personal experience and AI suggestions, Option 1 is the simplest and most efficient but has the most bugs; Option 2 is a two‑step process with fewer bugs; Option 3 is the most robust but requires extra coding and compute resources.

Validating Option 1

The author created a ClickHouse catalog in Doris:

CREATE CATALOG `jdbc_clickhouse01` PROPERTIES (
  "user" = "default",
  "type" = "jdbc",
  "password" = "***",
  "jdbc_url" = "jdbc:clickhouse://192.168.xxx.xxx:8123",
  "driver_url" = "clickhouse-jdbc-0.4.6-all.jar",
  "driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
);

Note: If the server cannot access the internet, the ClickHouse JDBC jar must be downloaded beforehand and placed in /usr/share/java on every Doris node.

After switching to the catalog, the author attempted an INSERT INTO … SELECT * operation. It succeeded for small local tables (e.g., 225 rows) but failed for larger local tables and all distributed tables, producing errors such as IndexOutOfBoundsException. The conclusion was that this method works only for small, non‑sharded tables.

Option 2 – Export & Stream Load

Step 1: Export ClickHouse tables to TSV files (the TSV format is a tab‑separated CSV). Example command:

clickhouse-client --query="SELECT * FROM ck_db.ck_big_table FORMAT TSV" > /tmp/ck_big_table.tsv

Step 2: Transfer the TSV files to a Doris client node and load them using Stream Load with a tab delimiter:

curl --location-trusted -u doris_user:**** \
  -H "label:label01" \
  -H "column_separator:\t" \
  -T ck_big_table.tsv \
  http://192.168.xxx.xxx:8030/api/doris_db/ck_big_table/_stream_load

The response confirms a successful load of 1,406,520 rows in about 20 seconds. The author chose TSV over CSV because many fields contain commas, which would break a standard CSV import. JSON was avoided because it would require manually supplying the table schema.

Performance testing showed that exporting 1.4 million rows from ClickHouse took less than 2 seconds, and the Doris Stream Load completed in under 20 seconds.

Option 3 – Spark Pipeline

The author mentions that Spark can read ClickHouse tables and write to Doris, but this approach was not fully demonstrated because it incurs additional coding and resource overhead.

Final Thoughts

For heterogeneous database migrations, there is no single “high‑efficiency, simple, elegant” solution. A pragmatic strategy combines multiple methods: manually create corresponding Doris tables based on ClickHouse schemas, use the ClickHouse catalog with Stream Load for straightforward tables, and fall back to Spark for complex cases.

Overall, the author found the catalog‑plus‑Stream Load combo to be the most practical for most tables, while acknowledging that large‑scale migrations still require careful planning and possibly custom tooling.