How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Approaches
This article walks through three concrete methods for moving massive ClickHouse datasets—up to 100 billion rows—to Doris, detailing catalog integration, file export with stream load, and Spark‑based pipelines, while sharing real‑world performance results and pitfalls.
Background: A ClickHouse cluster is being decommissioned, but it holds roughly 100 billion rows of data that need to be preserved. The author decides to migrate the data to Apache Doris (CK version 25.x, Doris version 2.1.2) and evaluates three feasible migration strategies.
Migration Options
Option 1: Use Doris' ClickHouse catalog and execute INSERT INTO … SELECT * FROM … directly.
Option 2: Export each ClickHouse table to files (CSV/TSV/JSON/Parquet) and load them into Doris via Broker Load or Stream Load.
Option 3: Read ClickHouse tables with Spark and write the results into Doris tables.
Based on personal experience and AI suggestions, Option 1 is simplest but most bug‑prone; Option 2 is a bit more complex yet more stable; Option 3 is the most robust but requires custom code and extra compute resources.
Validating Option 1
Steps:
Create a ClickHouse catalog in Doris:
CREATE CATALOG `jdbc_clickhouse01` PROPERTIES (
"user" = "default",
"type" = "jdbc",
"password" = "***",
"jdbc_url" = "jdbc:clickhouse://192.168.xxx.xxx:8123",
"driver_url" = "clickhouse-jdbc-0.4.6-all.jar",
"driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
);Switch to the catalog and attempt INSERT INTO … SELECT *.
Result: Works for small local tables (e.g., 225 rows) but fails for larger local tables and all partitioned tables, producing errors such as IndexOutOfBoundsException. The method is therefore unreliable for big or sharded tables.
Validating Option 2
Process:
Export ClickHouse tables to TSV (tab‑separated CSV) using:
clickhouse-client --query="SELECT * FROM ck_db.ck_big_table FORMAT TSV" > /tmp/ck_big_table.tsvTransfer the TSV files to a Doris node.
Load them with Doris Stream Load, specifying column_separator:\t:
curl --location-trusted -u doris_user:**** \
-H "label:label01" -H "column_separator:\t" \
-T ck_big_table.tsv \
http://192.168.xxx.xxx:8030/api/doris_db/ck_big_table/_stream_loadThe response includes metrics such as NumberTotalRows: 1406520 , LoadTimeMs: 11456 , etc.
Use TSV because commas in data would break standard CSV parsing; JSON was avoided due to the need for explicit schema definitions.
Performance: Exporting ~1.4 GB of data from ClickHouse took under 2 seconds; Doris Stream Load completed in under 20 seconds, indicating both steps are fast.
Final Thoughts
There is no single “silver‑bullet” solution for heterogeneous DB migration. In practice, a combination of strategies is required: manually create matching Doris tables based on ClickHouse schema, then use the catalog + Stream Load combo for simple tables, and fall back to Spark for complex cases.
Readers are invited to share their own experiences with ClickHouse‑to‑Doris migrations.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
