Backend Development 16 min read

Evolution of Ctrip Vacation Product Log System: From Single‑Table DB to ES + HBase Platform

This article details the three‑stage evolution of Ctrip's vacation product log system—from a simple single‑table DB approach, through a platform‑based ES + HBase solution, to a scalable V3.0 architecture that improves storage, search, and business empowerment while handling billions of log entries.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Evolution of Ctrip Vacation Product Log System: From Single‑Table DB to ES + HBase Platform

Author : cd, senior backend development engineer at Ctrip, focusing on vacation product system development and backend performance optimization.

The vacation product system at Ctrip processes over 6 billion daily data change records across thousands of tables, generating more than 1.7 trillion log entries to support debugging, supplier monitoring, and BI analysis.

Development trajectory can be divided into three phases: (1) pre‑2019 single‑table DB logs, (2) 2020‑2022 platform‑based logging, and (3) 2023‑2024 open‑platform empowerment.

V1.0 – Single‑table DB storage : a simple table (id, LogContent) stored unstructured text. Problems included massive table size (over 1 billion rows, ~370 GB), slow queries, poor readability, and tight coupling with business code.

V2.0 – Platformization :

Technical selection considered ES + HBase, MongoDB, and ClickHouse. MongoDB was discarded due to limited scale; ClickHouse could not meet retention requirements. ES + HBase was chosen for its complementary strengths in search and massive write throughput.

The architecture uses HBase for durable storage and Elasticsearch for fast search, linking ES DocID with HBase RowKey. Log API calls are queued via MQ, decoupling write latency from business logic.

RowKey design principles : uniqueness, uniform distribution, orderability, compactness, and readability. The RowKey consists of five parts: {md5‑pk‑8hex}-{tableId‑padded‑8}-{pk+rand‑padded‑24}-{logType‑padded‑16}-{timestamp}.

Extension : a unified data‑write service and log‑ingestion API abstract the logging logic, allowing easy configuration of new log types via a central configuration center.

Write flow :

The client calls the log API, the service pushes the request to MQ, consumers generate the RowKey, write the raw log to HBase, then index it in Elasticsearch; failures are compensated via a Redis cluster.

Query flow :

The client calls the query API, which searches Elasticsearch for matching RowKeys, then fetches the full log content from HBase in batch.

Advantages: comprehensive, configurable, structured indexing, and support for massive log volumes. Disadvantages: primarily developer‑oriented, making it harder for suppliers or business users to consume directly.

V3.0 – Empowerment :

Storage scaling : horizontal sharding and expansion of ES and HBase clusters, with routing rules in the log‑config center to direct different business lines to dedicated clusters.

Search enhancements :

Expanded index fields (10 configurable fields covering numeric, string, and date types).

Time‑based index partitioning with weekly index creation and yearly retention, improving query speed and storage management.

Routing rules in the configuration center map business lines to appropriate log clusters, providing flexible and elastic access.

Supplier empowerment :

A B‑end log query page was built to present logs in a business‑friendly format, converting raw logs into readable tables, key‑value pairs, and comparative views, thus reducing reliance on developers for troubleshooting.

Seven log‑display patterns were identified:

Plain text fields – displayed directly.

Data association – IDs are resolved to descriptive names via DB joins.

Enum mapping – keys are translated to human‑readable enum values.

Bit‑storage – bitwise‑encoded values are decoded.

Field combination – related fields are merged for a unified view.

External API – values are fetched from external services (e.g., city IDs to names).

Diff comparison – snapshots are compared to highlight changes.

Each pattern is illustrated with screenshots (omitted for brevity).

Conclusion : The article presents the full evolution of the vacation product log platform, addressing massive data storage and search challenges with ES + HBase, achieving sub‑500 ms query latency on trillions of records, opening the system to suppliers and business users, and scaling horizontally to support multiple business lines now storing petabyte‑scale data.

backendBig DataData PipelinescalabilityElasticsearchHBaselog system
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.