How JD Ads Cut Storage Costs 87% with Apache Doris Hot‑Cold Data Tiering
JD Advertising built a massive ad‑data warehouse on Apache Doris, reaching nearly 1 PB and 18 trillion rows, then implemented a hot‑cold data tiering strategy—first a lake‑based approach, later a native tiering solution in Doris 2.0—reducing storage costs by 87% and boosting query performance over tenfold.
1. Background
JD Advertising built an ad‑data storage service on Apache Doris to provide real‑time ad effectiveness reports and multidimensional analysis. After years of growth the system holds close to 1 PB of data, over 18 trillion rows, and handles more than 80 million queries per day. Storage became a bottleneck as data volume grew tenfold while query volume only doubled. Analysis showed 99% of daily queries target data from the past year, revealing a clear hot‑cold data pattern. A hot‑cold tiering solution was needed to lower storage and usage costs.
2. Cold‑Hot Tiering Solutions
Two solutions have been tried:
V1 – Data Lake Approach
Cold data is exported from Doris via the Spark‑Doris‑Connector (SDC) into a data lake (e.g., Iceberg). Queries are rewritten: cold queries are redirected to external lake tables, while hot queries run directly on Doris OLAP tables. This decouples cold‑data processing from the online system, improving stability, but introduces ETL overhead, UNION operations between hot and cold sources, and schema‑change dependencies on the lake.
V2 – Native Doris Tiering (Doris 2.0)
Doris 1.2’s native tiering moves cold data to cheaper storage based on TTL, but it only works on physical‑machine deployments and requires pre‑estimating cold‑data size. Doris 2.0 supports storing cold data in distributed systems such as OSS or HDFS, simplifying architecture. However, hot and cold queries share the same cluster, so high‑priority hot queries can be impacted by cold queries, requiring throttling.
3. Problem Solving
3.1 Doris 2.0 Performance Optimizations & Fixes
Upgrading to Doris 2.0 introduced several issues:
Query performance drop : New optimizer caused ~50% slowdown; disabled it and addressed other regressions.
Bucket pruning failure : Fixed via PR #38565 .
Prefix index failure : Resolved by aligning date types, PR #39446 .
High FE CPU usage : Flame‑graph analysis led to multiple optimizations, reducing CPU consumption.
Time‑comparison inefficiency : Optimized partition pruning, PR #31970 , cutting CPU usage by ~25%.
Materialized view rewrite waste : Disabled unnecessary rewrites, PR #40000 .
BE memory growth : Adjusted SegmentCache thresholds, lowering memory usage from >60% to <25%.
3.2 Cold‑Data Schema Change Optimizations
Standard schema change on cold data degraded to Direct Schema Change, causing heavy I/O and long runtimes (e.g., 7 days for a 20 TB table). Optimizations include:
Linked Schema Change : Use ChubaoFS CopyObject to copy data directly in remote storage, avoiding double transfer; speedup ~40× (PR #40963 ).
Single‑leader SC : Only the leader replica performs SC; others generate metadata, preventing duplicate copies.
Light Schema Change for cold data : Extend Light SC to support adding Key columns, enabling millisecond‑level changes.
3.3 Other Issues
Historical data was backed up to external storage before tiering. After upgrading to Doris 2.0, a data migrator tool moved data back online, handling schema mismatches with a restore tool (narwal_cli). A real‑time write failure during restore (LOAD_RUN_FAIL) was fixed (PR #39595 ). Unified hot‑cold policies were applied: historical data cooled to ChubaoFS instantly, hot data set to auto‑cool after two years, eliminating further storage expansion needs.
4. Conclusion
Implementing hot‑cold tiering reduced storage costs by ~87% and increased concurrent query capacity by more than tenfold, while simplifying maintenance. The success relied on Apache Doris community contributions and JD’s OLAP team, paving the way for further collaboration on compute‑storage separation in ad‑tech scenarios.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
