Cold‑Hot Data Tiering Solutions for JD Advertising Using Apache Doris
JD Advertising built a petabyte‑scale ad analytics service on Apache Doris, identified a hot‑cold access pattern, and implemented a native cold‑hot tiering solution (upgrading to Doris 2.0 and optimizing schema changes) that cut storage costs by ~87% and boosted concurrent query capacity over tenfold while simplifying operations.
JD Advertising builds its ad data storage service on Apache Doris, accumulating petabyte‑scale data (≈1 PB, 18 trillion rows) and handling over 80 million daily queries. Most queries (≈99%) target data from the past year, exhibiting a clear hot‑cold access pattern. To reduce storage costs and improve performance, the team designed a cold‑hot data tiering solution.
Background : Rapid data growth turned storage into a bottleneck. While storage capacity was expanded tenfold, query volume only doubled, leading to under‑utilized compute resources.
Tiering Option V1 – Data Lake : Cold data is exported from Doris via the Spark‑Doris‑Connector (SDC) into a data lake (e.g., Iceberg). Queries are rewritten: hot queries run directly on Doris OLAP tables, while cold queries are redirected to external lake tables. Advantages include decoupling cold‑data workloads from the online Doris cluster. Drawbacks are the need for ETL pipelines, UNION of hot and cold tables, and schema‑change dependencies on the lake.
Tiering Option V2 – Doris Native Cold‑Hot Tiering : In Doris 1.2, a TTL‑based mechanism moves cold data to cheaper storage, but it is limited to physical‑machine deployments and requires pre‑estimated cold‑data size. Doris 2.0 introduces support for external distributed storage (OSS, HDFS) with configurable storage policies, simplifying the architecture but requiring careful throttling of cold‑data queries.
Problem Solving (Section 3.1) : During the upgrade from Doris 1.2 to 2.0, several issues were identified and resolved:
Performance regression due to the new optimizer – disabled the optimizer.
Bucket‑pruning failure on rollup tables – fixed via PR #38565.
Prefix‑index failure caused by date‑type casting – fixed via PR #39446.
High FE CPU usage – identified hot paths and optimized them.
Inefficient time‑comparison logic – optimized in PR #31970, reducing CPU by ~25%.
Unnecessary plan rewrites for materialized‑view‑less rollups – fixed in PR #40000.
BE memory growth due to SegmentCache mis‑configuration – tuned thresholds to cut memory usage from >60% to <25%.
Cold‑Data Schema Change Optimizations (Section 3.2) :
Linked Schema Change : Use ChubaoFS CopyObject to copy data directly in remote storage, avoiding double network transfer and speeding up Add‑Key‑Column operations by ~40× (PR #40963).
Single‑Replica Schema Change : Execute SC only on the leader replica, preventing redundant copies.
Light Schema Change for Cold Data : Extend Light SC to support adding Key columns, achieving millisecond‑level changes.
Other Issues (Section 3.3) : Implemented a data migrator to backup and restore historical data, and a narwal_cli tool to align schemas during restore. Configured automatic cooldown policies (10 s for historical data, 2 years for hot data) to achieve seamless cold‑hot transitions.
Conclusion (Section 04) : The cold‑hot tiering reduced storage costs by ~87% and improved concurrent query capacity by over 10× compared with the previous lake‑based approach. The solution simplified operations, lowered latency, and demonstrated the benefits of compute‑storage separation in a large‑scale ad analytics environment.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.