Big Data 12 min read

Applying Apache Doris for JD.com Advertising Report Queries: Architecture, Challenges, and Performance

This article details JD.com's transition from a custom ad‑reporting system to Apache Doris, describing the background, challenges with the legacy platform, selection criteria, implementation of data import, pre‑aggregation, on‑site computation, and the resulting performance and operational benefits during regular operation and major sales events.

DataFunTalk
DataFunTalk
DataFunTalk
Applying Apache Doris for JD.com Advertising Report Queries: Architecture, Challenges, and Performance

Background: JD.com’s advertising platform JingzhunTong provides real‑time and offline report queries for advertisers, supporting dozens of business lines, over 300 reports, tens of millions of daily queries, and billions of rows of data.

Problems with the existing system: performance could not meet growing business demands, schema‑change and rollup operations required high manual effort, data migration was costly, and the system became a risk during large‑scale promotions such as 618 and Double‑11.

Technical selection: the new engine needed to handle high‑concurrency, millisecond‑level query latency, scalable import of both offline (T+1) and real‑time (minute‑level) data, online schema change, and data repair. After evaluating ClickHouse, Druid, and Doris, Doris was chosen for its sub‑second query speed, superior concurrency support, easy scaling, rollup and online schema‑change capabilities, and MySQL‑protocol compatibility.

Implementation: Doris now serves as the unified storage layer, aggregating offline and real‑time data. Real‑time streams (hundreds of MB to a few GB) are imported within seconds to a minute, while daily offline batches (20‑30 GB) finish in 10‑20 minutes. Pre‑aggregation is achieved via rollup tables and materialized views, reducing query latency. On‑site computation leverages Doris’s MPP architecture to handle high‑dimensional ad‑hoc queries efficiently.

Business impact: The new architecture eliminated the need for separate advertiser and category tables, enabling millisecond‑level query responses and flexible dimension queries. During major promotions, Doris handled over 4500 qps, 80 million daily queries, and a data growth of nearly 300 billion rows per day, with TP99 in the millisecond range.

Operational experience: Low‑latency and high‑throughput workloads were isolated using two clusters. Upgrading the Frontend (FE) to NIO improved connection handling, and adding Backend (BE) nodes automatically balanced load. Doris proved reliable across multiple large‑scale sales events.

Conclusion: After more than a year in production, Doris meets daily and peak requirements, reduces maintenance costs, and has become a core component of JD advertising, with plans to extend its use to other advertising data scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceScalabilityData WarehouseOLAPApache DorisAd Reporting
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.