Big Data 20 min read

Customer Data Platform (CDP) at Qunar Travel: Architecture, Construction Practices, and Business Value

This article presents a comprehensive case study of Qunar Travel's Customer Data Platform (CDP), detailing its business background, operational pain points, architectural design, tag production and quality processes, real‑time labeling, crowd selection techniques, deployment safeguards, measurable business impact, and future development directions.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Customer Data Platform (CDP) at Qunar Travel: Architecture, Construction Practices, and Business Value

Zhang Jie, who joined Qunar Travel in 2015 as the Big Data Director, shares a decade of experience in data warehousing, platforms, and governance, focusing on data‑driven business empowerment.

The CDP (Customer Data Platform) has become a standard tool for fine‑grained operations; Qunar Travel’s multi‑year CDP implementation has generated billions in revenue and earned the company’s annual Gold Award, with the content distilled from external talks at CSDI SUMMIT, InfoQ QCon+, DataFun, and live streams.

Qunar Travel, founded in 2005, serves nearly 600 million users across dozens of lines (flights, hotels, train tickets, etc.). With diminishing traffic‑growth incentives, the industry shifts from acquisition to retention, exemplified by re‑engaging lapsed users via coupon reminders and targeting specific user segments such as families.

Operational activities face pain points at every stage: data silos and analysis difficulty, low‑quality tags affecting ROI, high development cost across many teams and systems, and long iteration cycles that reduce efficiency.

To address these, the CDP abstracts the operation process into four core steps—building profile tags, crowd selection, configuring strategies, and analyzing effects—forming a one‑stop fine‑grained operation platform that feeds high‑value data back to the business.

Compared with CRM (sales‑oriented) and DMP (anonymous public‑data), CDP (emerging since 2016) serves a broader audience (operations, marketing, product, sales) and leverages private data for precise, timely user engagement.

The CDP’s business logic follows a 4W1H principle: delivering the right content to the right user at the right time and place, creating a closed loop of data‑to‑business‑to‑data.

Construction of the CDP follows three principles: a one‑stop solution for full‑process self‑service, universal tagging for users and products, and high‑availability services supporting massive concurrent calls. The overall data platform architecture includes a foundational layer (Hive, Trino, Flink, Hadoop, HBase, Kafka), data development, governance, and application layers, with CDP residing in the data‑application tier.

The functional architecture of CDP consists of core capabilities (tag production, quality, insight), tag services (high‑performance APIs backed by Redis and HBase), real‑time labeling (FlinkSQL with UDFs), and crowd selection (ClickHouse bitmap queries). Tag production supports SQL‑based, rule‑based, and model‑based tags, for users, products, or any entity, with static, periodic, or real‑time lifecycles, and composite tags for complex logic.

Tag quality assurance includes automated quality checks that block faulty tags from going online and generate multi‑dimensional quality reports for user evaluation.

Tag insight provides detailed analysis of tag population composition, aiding decision‑making.

The CDP value chain is broken into three links: (1) tag production → data analysis, (2) tag service → business application (recommendation, pricing, promotion), and (3) user grouping → marketing strategy → user reach → effect analysis, forming a complete feedback loop.

Real‑time tagging challenges—lowering construction barriers, merging real‑time and offline data—are solved with FlinkSQL visual configuration and a Lambda architecture, achieving sub‑second latency.

Crowd selection handles billions of users with ClickHouse bitmap functions; for example, the SQL pseudo‑code is select bitmapCardinality(bitmapAnd(C, bitmapOr(A, B))) from tag_bitmap , delivering average query times of 2 seconds.

Reliability is ensured through containerized deployment with auto‑scaling, HA mechanisms (dual‑write to Redis/HBase, version switching), resource isolation per business line, circuit‑breaker and retry logic, rate limiting, and multi‑level caching.

Business impact includes a 5‑fold ROI increase from high‑value reusable tags, a 3‑fold efficiency boost from self‑service, 99.99 % tag‑service availability, peak QPS over 300 k, P99 latency of 5 ms, and crowd‑selection queries averaging 2 seconds, culminating in the 2021 Gold Award.

Future directions aim to enrich model‑based tags, integrate with BI for deeper analytics, and introduce intelligent strategy selection to maximize ROI while minimizing user disturbance.

The article concludes with a Q&A covering tag creation methods, quality evaluation, and feedback mechanisms, followed by recruitment information for interested candidates.

Real-time Processingdata-platformTaggingCDPCustomer DataQunar Travel
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.