How Ctrip Scaled Its Travel Product Log System to Billions of Records
This article traces the evolution of Ctrip’s travel product log platform—from a single‑table DB approach to a platform‑wide ES + HBase solution—detailing the challenges of massive data volume, the architectural decisions, RowKey design, write and query flows, and the subsequent extensions that enabled billion‑scale log storage and fast retrieval.
Background
Ctrip’s travel product system manages extremely complex goods with thousands of underlying tables, generating over 600 million data‑change records per day. These logs are essential for debugging, supplier verification, and BI analysis, and have accumulated to more than 170 billion entries.
Development trajectory
Pre‑2019: Single‑table DB logging (V1.0)
2020‑2022: Platform‑level logging (V2.0)
2023‑2024: Open and empowered logging (V3.0)
Evolution process
V1.0 – DB single‑table storage
Logs were stored in a single relational table with columns id and LogContent. LogContent held unstructured text and queries relied on LIKE statements. This design caused:
Massive data volume (>1 billion rows, ~370 GB) leading to frequent query timeouts and the need for periodic archiving.
Poor readability – only developers could interpret the raw text.
Low extensibility – adding a new log type required code changes and could not be accessed directly by suppliers or business users.
V2.0 – Platformization
Technical selection
The team evaluated several large‑scale log storage solutions:
ES + HBase : HBase provides high‑throughput random writes and PB‑level capacity; Elasticsearch offers powerful full‑text search. The combination satisfies both storage and query requirements but adds architectural complexity.
MongoDB : Document‑oriented but limited to ~1 billion records, unsuitable for the required scale.
ClickHouse : Column‑store with excellent query performance, yet internal cost constraints limited its retention period.
Considering data volume and cost, the ES + HBase stack was selected.
Overall architecture
Log entries are received via a unified API, placed onto a message queue (MQ), and processed asynchronously. The MQ decouples logging from business logic and smooths write spikes.
RowKey design
The RowKey consists of five components to guarantee uniqueness, uniform distribution, and query efficiency:
{0}: MD5‑hex of the primary key, first 8 characters
{1}: Table ID padded to 8 digits
{2}: Primary key plus a 4‑digit random suffix, padded to 24 characters
{3}: Log type padded to 16 characters
{4}: TimestampExtensibility
A unified data‑write service and logging API allow any module (e.g., entry, direct‑connect) to emit logs. Configuration‑driven rules in a central log‑config center govern how new log types are handled.
Write flow
Clients call the log API; the service pushes the request to MQ. Consumers generate the RowKey, write the raw log to HBase, and index the document in Elasticsearch. If either write fails, the log is stored in a Redis compensation cluster for later retry.
Query flow
Clients invoke the query API, which translates parameters into an Elasticsearch paginated request. ES returns matching RowKeys; the service then batch‑retrieves the full log content from HBase. A web UI built on this API enables developers to search logs efficiently.
V3.0 – Empowerment
Business empowerment
To handle growing write volume, ES and HBase clusters were horizontally split and scaled. A routing rule in the log‑config center directs different log types to dedicated clusters, and independent clusters can be provisioned per business line.
Search capability enhancements
Index field expansion : Ten additional customizable fields (4 numeric, 4 string, 2 date) support diverse query scenarios.
ES index partitioning : Weekly indices retained for one year, enabling time‑range queries and easy deletion of stale data.
Supplier empowerment
A B‑side log query page was created for suppliers and business users. Raw logs are transformed into readable formats (key‑value conversion, enum mapping, relational lookups, external API calls) and displayed consistently with the product system UI.
Display types for log conversion
Seven common presentation patterns were identified and implemented in a configurable conversion engine:
Plain text fields – displayed directly.
Data association – IDs are resolved to human‑readable values via DB joins.
Enum mapping – numeric keys are replaced with descriptive labels.
Bit‑field storage – bitwise calculations are decoded back to original values.
Field combination – multiple related fields are merged for a concise range view.
External API – IDs are translated by calling external services (e.g., city ID to city name).
Difference comparison – snapshots are compared to highlight changes.
Adding a new log type typically only requires updating the extraction and conversion configuration; the B‑side UI reflects the changes automatically.
Conclusion
The platform evolved from a single‑table DB log to a scalable ES + HBase solution with a carefully designed RowKey, asynchronous MQ processing, and flexible configuration. Horizontal scaling, index partitioning, and a conversion engine opened the system to suppliers and business users, achieving billion‑scale log storage with sub‑second query latency while maintaining extensibility and low integration cost.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
