Big Data 14 min read

Ctrip's Big Data Architecture and Personalized Recommendation System

This article describes how Ctrip transformed its traditional application architecture into a high‑concurrency, big‑data‑driven platform, detailing storage, compute, and business‑layer redesigns that enable massive data ingestion, real‑time user‑intent services, and a scalable personalized recommendation system.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Ctrip's Big Data Architecture and Personalized Recommendation System

In the mobile‑Internet era, e‑commerce platforms must attract, retain, and extract value from users; Ctrip addresses this by building a data‑driven product strategy powered by a dedicated big‑data team, which has dramatically improved business performance.

Key challenges arise from rapid business growth: a 5.5‑fold increase in daily request volume since January 2016, increasingly complex business logic across dozens of OTA lines, heterogeneous data sources (Hive tables, logs, weather, web data, etc.), and the need for fast iteration with minimal development effort.

To meet these challenges, Ctrip re‑architected its system into three "rebirth" layers: storage, compute, and business‑layer. The storage layer focuses on high‑throughput, scalable solutions; the compute layer balances horizontal distribution and vertical layering; the business layer emphasizes modular design, clear boundaries, and configurability.

The overall system diagram (see image) shows the integration of data sources, offline processing, near‑line streaming, and online services. Data sources include Hermes (Kafka‑based message queue), Hive and HDFS for massive batch storage, and various structured and unstructured inputs.

Offline processing leverages MapReduce, Hive, Mahout, and Spark MLlib for batch analytics and machine‑learning model training. Near‑line processing uses Muise (Storm‑based) together with Hermes to compute real‑time user intent from behavior streams. Online services rely on MySQL, Elasticsearch, HBase, and Redis for persistent storage, search, and caching.

The recommendation system case study illustrates the storage "rebirth" by replacing MySQL with HBase + Redis, achieving TB‑scale data, millions of daily requests, and 99 % of responses under 50 ms. Elasticsearch provides multi‑dimensional search and ranking, reducing complex MySQL queries to a single ES query with sub‑100 ms latency.

Data processing follows three stages: preprocessing (deduplication, enrichment), data‑mining (classification, clustering, collaborative filtering), and result import (loading recommendations into HBase, Redis, and building ES indexes).

Near‑line user‑intent computation enriches real‑time behavior with historical profiles to generate intent lists stored in HBase and Redis, while product caching asynchronously aggregates product data via Kafka and Storm to reduce load on backend services.

Online business‑layer modules include a unified data‑governance layer that abstracts storage (LocalCache, Redis, HBase, MySQL) with transparent failover and scaling, and a recommendation‑strategy layer that filters, aggregates, and ranks products according to scenario rules, achieving over 50 % reduction in development time through modular DSL composition.

Author: Dong Rui, senior leader of Ctrip's Data‑Intelligent Application Group, with nine years of internet experience and expertise in Hadoop‑based data warehouses and personalized recommendation systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datarecommendation systemSparkHadoopCtrip
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.