How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested
Facing a ClickHouse cluster shutdown, the author explores three migration methods—using Doris’s ClickHouse catalog, exporting to files with Broker/Stream Load, and Spark—to transfer ~10 billion rows to Doris, evaluating each for simplicity, bugs, and performance, and sharing detailed steps, code snippets, and benchmark results.
When a ClickHouse cluster needed to be decommissioned, the author needed to back up roughly 10 billion rows of data that might be needed later. The goal was to migrate this data to Doris, a modern analytical database, and compare three feasible migration approaches.
Migration Options
Option 1: Use Doris’s ClickHouse catalog and execute INSERT INTO … SELECT * directly.
Option 2: Export each ClickHouse table to a file (CSV/TSV/Parquet) and load the files into Doris via Broker Load or Stream Load.
Option 3: Read ClickHouse tables with Spark and write the results into Doris.
Based on personal experience and AI suggestions, Option 1 is the simplest and most efficient but has the most bugs; Option 2 is a two‑step process with fewer bugs; Option 3 is the most robust but requires extra coding and compute resources.
Validating Option 1
The author created a ClickHouse catalog in Doris:
CREATE CATALOG `jdbc_clickhouse01` PROPERTIES (
"user" = "default",
"type" = "jdbc",
"password" = "***",
"jdbc_url" = "jdbc:clickhouse://192.168.xxx.xxx:8123",
"driver_url" = "clickhouse-jdbc-0.4.6-all.jar",
"driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
);Note: If the server cannot access the internet, the ClickHouse JDBC jar must be downloaded beforehand and placed in /usr/share/java on every Doris node.
After switching to the catalog, the author attempted an INSERT INTO … SELECT * operation. It succeeded for small local tables (e.g., 225 rows) but failed for larger local tables and all distributed tables, producing errors such as IndexOutOfBoundsException. The conclusion was that this method works only for small, non‑sharded tables.
Option 2 – Export & Stream Load
Step 1: Export ClickHouse tables to TSV files (the TSV format is a tab‑separated CSV). Example command:
clickhouse-client --query="SELECT * FROM ck_db.ck_big_table FORMAT TSV" > /tmp/ck_big_table.tsvStep 2: Transfer the TSV files to a Doris client node and load them using Stream Load with a tab delimiter:
curl --location-trusted -u doris_user:**** \
-H "label:label01" \
-H "column_separator:\t" \
-T ck_big_table.tsv \
http://192.168.xxx.xxx:8030/api/doris_db/ck_big_table/_stream_loadThe response confirms a successful load of 1,406,520 rows in about 20 seconds. The author chose TSV over CSV because many fields contain commas, which would break a standard CSV import. JSON was avoided because it would require manually supplying the table schema.
Performance testing showed that exporting 1.4 million rows from ClickHouse took less than 2 seconds, and the Doris Stream Load completed in under 20 seconds.
Option 3 – Spark Pipeline
The author mentions that Spark can read ClickHouse tables and write to Doris, but this approach was not fully demonstrated because it incurs additional coding and resource overhead.
Final Thoughts
For heterogeneous database migrations, there is no single “high‑efficiency, simple, elegant” solution. A pragmatic strategy combines multiple methods: manually create corresponding Doris tables based on ClickHouse schemas, use the ClickHouse catalog with Stream Load for straightforward tables, and fall back to Spark for complex cases.
Overall, the author found the catalog‑plus‑Stream Load combo to be the most practical for most tables, while acknowledging that large‑scale migrations still require careful planning and possibly custom tooling.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
