Big Data 8 min read

How Ctrip Builds a Scalable User Profile Platform for Personalized Travel

This article explains why Ctrip creates user profiles, describes the product and technical architectures, and details the data collection, computation, storage, high‑availability querying, and monitoring components that power its personalized travel recommendations and services.

21CTO
21CTO
21CTO
How Ctrip Builds a Scalable User Profile Platform for Personalized Travel

1. Why Ctrip Builds User Profiles

Ctrip uses user profiles to power recommendation algorithms that match products to user preferences and to provide personalized services, thereby improving user experience and reducing unwanted interruptions.

2. Architecture of Ctrip User Profiles

2.1 Product Architecture

All profiles are registered in the UserProfile platform, reviewed, and then flow into the data warehouse. The pipeline includes registration, data collection, computation, storage/query, and monitoring.

2.2 Technical Architecture

Ctrip’s large‑scale system emphasizes loose coupling and high cohesion, with a BU‑oriented management model. Profiles are processed across BUs, using open‑source DataX and Storm to move data into a cross‑BU UserProfile data warehouse, cached by Redis, and accessed via real‑time and Elasticsearch‑based APIs.

3. Components of Ctrip User Profiles

3.1 Data Collection

Basic information is gathered from UserInfo, UBT (behavior), orders, crawlers, and mobile apps. Each data source has a dedicated collection process, illustrated by the order‑information collection flow.

3.2 Profile Computation

Collected raw data is transformed into valuable profiles. Asynchronous batch jobs (Hive, DataX) handle most calculations, while real‑time streams (Kafka + Storm) update time‑sensitive profiles such as user behavior.

3.3 Data Storage

The profile data, considered classic "big data," is stored in a sharded distributed warehouse with 160 shards across four physical clusters, employing cross‑IDC hot‑standby, SSDs, and other high‑availability technologies.

3.4 High‑Availability Query

API response time must stay below 250 ms; real‑time services achieve an average of 8 ms (99 % under 11 ms) using self‑degradation, circuit‑breaker, and traffic‑shaping techniques. Batch queries for large user groups use Elasticsearch.

3.5 Monitoring and Tracing

Multi‑layer monitoring validates profile accuracy across dimensions such as user level, hotel star rating, and flight class, and tracks variance over time to trigger re‑evaluation of algorithms.

All these components form Ctrip’s cross‑BU user profile platform, which continues to evolve with new technologies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System Architecturedata pipelinepersonalizationReal-time Processinguser profilingCtrip
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.