Big Data 12 min read

Design and Architecture of the User Profiling System at Ctrip Business Travel

This article describes the concept, tag taxonomy, data flow architecture, and Lambda‑based query service design of Ctrip Business Travel's user profiling system, highlighting how batch and real‑time processing with Spark, Flink, Hive, MongoDB and Redis enable precise marketing, risk control and personalized services.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design and Architecture of the User Profiling System at Ctrip Business Travel

The article introduces user profiling, originally proposed by Alan Cooper, as a virtual representation of real users built on multi‑dimensional data such as demographics, habits and consumption preferences, and explains its importance for fine‑grained operation and precise marketing at Ctrip Business Travel.

It then details the B2B and B2C tag taxonomy used by Ctrip, covering five major categories—basic attributes, CRM tags, preference tags, real‑time tags, and risk‑control tags—illustrating examples like company ID, activity duration, purchase frequency, flight‑hotel ratios, recent query rates, overdue amounts and credit scores.

The data flow architecture consists of data collection (offline Hive warehouse and online Kafka streams), feature computation (Spark SQL/UDF for batch, Flink for streaming), tag modeling (business‑rule, statistical, and machine‑learning methods), tag serving (Hive, MongoDB, Redis) and monitoring (Zeus, Grafana) to ensure data quality and service reliability.

The query service adopts a Lambda three‑layer architecture: a Batch Layer (Spark, Hive) for historical data, a Speed Layer (Flink) for low‑latency increments, and a Serving Layer (MongoDB, Redis) that merges both to provide fast, accurate user profile queries for downstream applications such as risk detection, recommendation ranking and fine‑grained operation.

In conclusion, building a robust user profiling system requires deep business understanding, careful tag design, reliable data pipelines, high‑availability deployment, and continuous monitoring; future work includes closing the tag‑generation loop and sharing B‑side data with the C‑side to address cold‑start problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datadata pipelineuser profilingCtrip
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.